Compare commits

590 Commits
bka ... vic

Author SHA1 Message Date
theshaenix
104418f71b Revert "atoll_defconfig: Enable DCE"
This reverts commit 6b0d712604bae4980606d6007dd81917f1b12c7c.
2025-11-26 01:41:16 +05:30
theshaenix
b22a4126fc config: enable CONFIG_ANDROID_SIMPLE_LMK 2025-11-26 01:41:16 +05:30
Sultan Alsawaf
19aaeeac83 mm: Always indicate OOM kill progress when Simple LMK is enabled
When Simple LMK is enabled, the page allocator slowpath always thinks that
no OOM kill progress is made because out_of_memory() returns false. As a
result, spurious page allocation failures are observed when memory is low
and Simple LMK is killing tasks, simply because the page allocator slowpath
doesn't think that any OOM killing is taking place.

Fix this by simply making out_of_memory() always return true when Simple
LMK is enabled.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:15 +05:30
Sultan Alsawaf
01a9f6d454 simple_lmk: Reap anonymous memory from victims
The OOM reaper makes it possible to immediately release anonymous memory
from a dying process in order to free up memory faster. This provides
immediate relief under heavy memory pressure instead of waiting for victim
processes to naturally release their memory.

Utilize the OOM reaper by creating another kthread in Simple LMK to perform
victim reaping. Similar to the OOM reaper kthread (which is unused with
Simple LMK), this new kthread allows reaping to race with exit_mmap() in
order to preclude the need to take a reference to an mm's address space and
thus potentially mmput() an mm's last reference. Doing so would stall the
reaper kthread, preventing it from being able to quickly reap new victims.

Reaping is done on victims one at a time by descending order of anonymous
pages, so that the most promising victims with the most anonymous pages
are reaped first. Victims are also marked for reaping via MMF_OOM_VICTIM so
that they reap themselves first in exit_mmap(). Even if a victim isn't
reaped by the reaper thread, it'll free its anonymous memory first thing in
exit_mmap() as a small win towards making memory available sooner.

By relieving memory pressure faster via reaping, Simple LMK not only
doesn't need to kill as many processes, but also improves system
responsiveness when memory is low since memory pressure is relieved sooner.

Although not strictly required, Simple LMK should be the only one utilizing
the OOM reaper. Any other code that may utilize the OOM reaper, such as
patches that invoke the OOM reaper for all SIGKILLs, should be disabled.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:15 +05:30
Sultan Alsawaf
dea3352211 simple_lmk: Reduce unnecessary wake ups
We can check if the waitqueue is actually active before calling wake_up()
in order to avoid an unnecessary wake_up() if the reclaim thread is already
running. Furthermore, the release barrier when zeroing needs_reclaim is
unnecessary, so remove it.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:14 +05:30
Sultan Alsawaf
1ae9421203 simple_lmk: Ratelimit the 'no processes available to kill' message
Under extreme simulated memory pressure, the 'no processes available to
kill' message can be spammed hundreds of thousands of times, which is not
productive. Ratelimit it.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:14 +05:30
Sultan Alsawaf
e8174dd162 simple_lmk: Fix victim scheduling priority elevation
As it turns out, victim scheduling priority elevation has always been
broken for two reasons:
1. The minimum valid RT priority is 1, not 0. As a result,
   sched_setscheduler_nocheck() always fails with -EINVAL.
2. The thread within a victim thread group which happens to hold the mm is
   not necessarily the only thread with references to the mm, and isn't
   necessarily the thread which will release the final mm reference. As a
   result, victim threads which hold mm references may take a while to
   release them, and the unlucky thread which puts the final mm reference
   may take a very long time to release all memory if it doesn't have RT
   scheduling priority.

These issues cause victims to often take a very long time to release their
memory, possibly up to several seconds depending on system load. This, in
turn, causes Simple LMK to constantly hit the reclaim timeout and kill more
processes, with Simple LMK being rather ineffective since victims may not
release any memory for several seconds.

Fix the broken scheduling priority elevation by changing the RT priority to
the valid lowest priority of 1 and applying it to all threads in the thread
group, instead of just the thread which holds the mm.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:13 +05:30
Sultan Alsawaf
1f5704dda6 simple_lmk: Thaw victims upon killing them
With freezable cgroups and their recent utilization in Android, it's
possible for some of Simple LMK's victims to be frozen at the time that
they're selected for killing. The forced SIGKILL used for killing
victims can only wake up processes containing TASK_WAKEKILL and/or
TASK_INTERRUPTIBLE, not TASK_UNINTERRUPTIBLE, which is the state used on
frozen tasks. In order to wake frozen tasks from their uninterruptible
slumber so that they can die, we must thaw them. Leaving victims frozen
can otherwise make them take an indefinite amount of time to process our
SIGKILL and thus free memory.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:13 +05:30
Sultan Alsawaf
605e5f50d4 simple_lmk: Make the reclaim thread freezable
There are two problems with the current uninterruptible wait used in the
reclaim thread: the hung task detector is upset about an uninterruptible
thread being asleep for so long, and killing processes can generate I/O.

Since killing a process can generate I/O, the reclaim thread should
participate in system-wide suspend operations. This neatly solves the
hung task detector issue since wait_event_freezable() puts the current
process into an interruptible sleep.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:12 +05:30
Sultan Alsawaf
9f11f8e6cb simple_lmk: Be extra paranoid if tasks can have no pages
If it's possible for a task to have no pages, then there could be a case
where `pages_found` is zero while `nr_found` isn't, which would cause
the found tasks' locks to never be unlocked, and thus mayhem. We can
change the `pages_found` check to use `nr_found` instead in order to
naturally defend against this scenario, in case it is indeed possible.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:12 +05:30
Sultan Alsawaf
45e9605f11 mm: Increment kswapd_waiters for throttled direct reclaimers
Throttled direct reclaimers will wake up kswapd and wait for kswapd to
satisfy their page allocation request, even when the failed allocation
lacks the __GFP_KSWAPD_RECLAIM flag in its gfp mask. As a result, kswapd
may think that there are no waiters and thus exit prematurely, causing
throttled direct reclaimers lacking __GFP_KSWAPD_RECLAIM to stall on
waiting for kswapd to wake them up. Incrementing the kswapd_waiters
counter when such direct reclaimers become throttled fixes the problem.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:11 +05:30
Jens Axboe
98711ba5a3 buffer: eliminate the need to call free_more_memory() in __getblk_slow()
Since the previous commit removed any case where grow_buffers()
would return failure due to memory allocations, we can safely
remove the case where we have to call free_more_memory() in
this function.

Since this is also the last user of free_more_memory(), kill
it off completely.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-26 01:41:11 +05:30
Jens Axboe
3fe581ec63 buffer: grow_dev_page() should use __GFP_NOFAIL for all cases
We currently use it for find_or_create_page(), which means that it
cannot fail. Ensure we also pass in 'retry == true' to
alloc_page_buffers(), which also ensure that it cannot fail.

After this, there are no failure cases in grow_dev_page() that
occur because of a failed memory allocation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-26 01:41:10 +05:30
Jens Axboe
fb2ba23ee7 buffer: have alloc_page_buffers() use __GFP_NOFAIL
Instead of adding weird retry logic in that function, utilize
__GFP_NOFAIL to ensure that the vm takes care of handling any
potential retries appropriately. This means we don't have to
call free_more_memory() from here.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-26 01:41:10 +05:30
Sultan Alsawaf
c77916e8c0 mm: vmpressure: Fix rampant inaccuracies caused by stale data usage
After a period of intense memory pressure is over, it's common for
vmpressure to still have old reclaim efficiency data accumulated from
this time. When memory pressure starts to rise again, this stale data
will factor into vmpressure's calculations, and can cause vmpressure to
report an erroneously high pressure. The reverse is possible, too:
vmpressure may report pressures that are erroneously low due to stale
data that's been stored.

Furthermore, since kswapd can still be performing reclaim when there are
no failed memory allocations stuck in the page allocator's slow path,
vmpressure may still report pressures when there aren't any memory
allocations to satisfy. This can cause last-resort memory reclaimers to
kill processes to free memory when it's not needed.

To fix the rampant stale data, keep track of when there are processes
utilizing reclaim in the page allocator's slow path, and reset the
accumulated data in vmpressure when a new period of elevated memory
pressure begins. Extra measures are taken for the kswapd issue mentioned
above by ignoring all reclaim efficiency data reported by kswapd when
there aren't any failed memory allocations in the page allocator which
utilize reclaim.

Note that since sr_lock can now be used from IRQ context, IRQs must be
disabled whenever sr_lock is used to prevent deadlocks.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:09 +05:30
Sultan Alsawaf
5a9d4a1128 mm: vmpressure: Fix a race that would erroneously clear accumulated data
Since the code that determines whether data should be cleared and the
code that actually clears the data are in separate spin-locked critical
sections, new data could be generated on another CPU after it is
determined that the existing data should be cleared, but before the
current CPU clears the existing data. This would cause the new data
reported by the other CPU to be lost.

Fix the race by clearing accumulated data within the same spin-locked
critical section that determines whether or not data should be cleared.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:09 +05:30
Sultan Alsawaf
5f296399e1 mm: vmpressure: Ignore costly-order allocations for direct reclaim too
The direct reclaim vmpressure path was erroneously excluded from the
PAGE_ALLOC_COSTLY_ORDER check which was added in commit "mm: vmpressure:
Ignore allocation orders above PAGE_ALLOC_COSTLY_ORDER".

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:08 +05:30
Sultan Alsawaf
af76860ba0 simple_lmk: Optimize victim finder to eliminate hard-coded adj ranges
Hard-coding adj ranges to search for victims results in a few problems.
Firstly, the hard-coded adjs must be vigilantly updated to match what
userspace uses, which makes long-term support a headache. Secondly, a
full traversal of every running process must be done for each adj range,
which can turn out to be quite expensive, especially if userspace
assigns many different adj values and we want to enumerate them all.
This leads us to the final problem, which is that processes with
different adjs within the same hard-coded adj range will be treated the
same, even though they're not: the process with a higher adj is less
important, and the process with a lower adj is more important. This
could be fixed by enumerating every possible adj, but again, that would
necessitate several scans through the active process list, which is bad
for performance, especially since latency is critical here.

Since adjs are only 16 bits, and we only care about positive adjs, that
leaves us with 15 bits of the adj that matter. This is a relatively
small number of potential adjs (32,768), which makes it possible to
allocate a static array that's indexed using the adj. Each entry in this
array is a pointer to the first task_struct in a singly-linked list of
task_structs sharing an adj. A `simple_lmk_next` member is added to
task_struct to accommodate this linked list. The victim finder now
iterates downward through the array searching for linked lists of tasks,
starting from the highest adj found, so that the lowest-priority
processes are always considered first for reclaim. This fixes all of the
problems mentioned above, and now there is only one traversal through
every running process. The array itself only takes up 256 KiB of memory
on 64-bit, which is a very small price to pay for the advantages gained.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:08 +05:30
Sultan Alsawaf
c4ea485bce simple_lmk: Cacheline-align the victims array and mm_free_lock on SMP
The victims array and mm_free_lock data structures can be used very
heavily in parallel on SMP, in which case they would benefit from being
cacheline-aligned. Make it so for SMP.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:07 +05:30
Sultan Alsawaf
64d131fa9c simple_lmk: Pass a custom swap function to sort()
When sort() isn't provided with a custom swap function, it falls back
onto its generic implementation of just swapping one byte at a time,
which is quite slow. Since we know the type of the objects being sorted,
we can provide our own swap function which simply uses the swap() macro.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:06 +05:30
Sultan Alsawaf
2375ad7d46 simple_lmk: Skip victim reduction when all victims need to be killed
When there aren't enough pages found, it means all of the victims that
were found need to be killed. The additional processing that attempts to
reduce the number of victims can be skipped in this case.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:06 +05:30
Sultan Alsawaf
1002f6aa63 simple_lmk: Use MIN_FREE_PAGES wherever pages_needed is used
There's no reason to pass this constant around in a parameter.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:05 +05:30
Sultan Alsawaf
51a309c4d6 simple_lmk: Don't block in simple_lmk_mm_freed() on mm_free_lock
When the mm_free_lock write lock is held, it means that reclaim is
either starting or ending, in which case there's nothing that needs to
be done in simple_lmk_mm_freed(). We can use a trylock here instead to
avoid blocking.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:04 +05:30
Sultan Alsawaf
14ade4e289 mm: vmpressure: Don't export tunables to userspace
Userspace could change these tunables and make Simple LMK function
poorly. Don't export them.

Reported-by: attack11 <fernandobouchet@gmail.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:04 +05:30
Sultan Alsawaf
c02ae8bc0e simple_lmk: Update Kconfig description for VM pressure change
Simple LMK uses VM pressure now, not a kswapd hook like before. Update
the Kconfig description to reflect such.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:03 +05:30
Sultan Alsawaf
073faae34a simple_lmk: Add !PSI dependency
When PSI is enabled, lmkd in userspace will use PSI notifications to
perform low memory kills. Therefore, to ensure that Simple LMK is the
only active LMK implementation, add a !PSI dependency.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:03 +05:30
Sultan Alsawaf
44b17edba7 simple_lmk: Print a message when the timeout is reached
This aids in selecting an adequate timeout. If the timeout is hit often
and Simple LMK is killing too much, then the timeout should be
lengthened. If the timeout is rarely hit and Simple LMK is not killing
fast enough under pressure, then the timeout should be shortened.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:02 +05:30
Sultan Alsawaf
2d403f20cc simple_lmk: Remove unnecessary clean-up when timeout is reached
Zeroing out the mm struct pointers when the timeout is hit isn't needed
because mm_free_lock prevents any readers from accessing the mm struct
pointers while clean-up occurs, and since the simple_lmk_mm_freed() loop
bound is set to zero during clean-up, there is no possibility of dying
processes ever reading stale mm struct pointers.

Therefore, it is unnecessary to clear out the mm struct pointers when
the timeout is reached. Now the only step to do when the timeout is
reached is to re-init the completion, but since reinit_completion() just
sets a struct member to zero, call reinit_completion() unconditionally
as it is faster than encapsulating it within a conditional statement.

Also take this opportunity to rename some variables and tidy up some
code indentation.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:02 +05:30
Sultan Alsawaf
39d8fd2def simple_lmk: Hold an RCU read lock instead of the tasklist read lock
We already check to see if each eligible process isn't already dying, so
an RCU read lock can be used to speed things up instead of holding the
tasklist read lock.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:01 +05:30
Sultan Alsawaf
10cf359ebd mm: Don't stop kswapd on a per-node basis when there are no waiters
The page allocator wakes all kswapds in an allocation context's allowed
nodemask in the slow path, so it doesn't make sense to have the kswapd-
waiter count per each NUMA node. Instead, it should be a global counter
to stop all kswapds when there are no failed allocation requests.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:01 +05:30
Sultan Alsawaf
b022abe810 simple_lmk: Consider all positive adjs when finding victims
We are allowed to kill any process with a positive adj, so we shouldn't
exclude any processes with adjs greater than 999. This would present a
problem with quirky applications that set their own adj score, such as
stress-ng. In the case of stress-ng, it would set its adj score to 1000
and thus exempt itself from being killed by Simple LMK. This shouldn't
be allowed; any process with a positive adj, up to the highest positive
adj possible (32767) should be killable.

Reported-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:00 +05:30
Sultan Alsawaf
10f5fa6607 mm: vmpressure: Ignore allocation orders above PAGE_ALLOC_COSTLY_ORDER
PAGE_ALLOC_COSTLY_ORDER allocations can cause vmpressure to incorrectly
think that memory pressure is high, when it's really just that the
allocation's high order is difficult to satisfy. When this rare scenario
occurs, ignore the input to vmpressure to avoid sending out a spurious
high-pressure signal.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:41:00 +05:30
Sultan Alsawaf
ac25977acb mm: Don't warn on page allocation failures for OOM-killed processes
It can be normal for a dying process to have its page allocation request
fail when it has an OOM or LMK kill pending. In this case, it's actually
detrimental to print out a massive allocation failure message because
this means the running process needs to die quickly and release its
memory, which is slowed down slightly by the massive kmsg splat. The
allocation failure message is also a false positive in this case, since
the failure is intentional rather than being the result of an inability
to allocate memory.

Suppress the allocation failure warning for processes that are killed to
release memory in order to expedite their death and remedy the kmsg
confusion from seeing spurious allocation failure messages.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:59 +05:30
Sultan Alsawaf
996e058e42 mm: Adjust tsk_is_oom_victim() for Simple LMK
The page allocator uses tsk_is_oom_victim() to determine when to
fast-path memory allocations in order to get an allocating process out
of the page allocator and into do_exit() quickly. Unfortunately,
tsk_is_oom_victim()'s check to see if a process is killed for OOM
purposes is to look for the presence of an OOM reaper artifact that only
the OOM killer sets. This means that for processes killed by Simple LMK,
there is no fast-pathing done in the page allocator to get them to die
faster.

Remedy this by changing tsk_is_oom_victim() to look for the existence of
the TIF_MEMDIE flag, which Simple LMK sets for its victims.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:59 +05:30
Sultan Alsawaf
90fd99dc74 mm: vmpressure: Don't cache the window size
Caching the window size can result in delayed or inaccurate pressure
reports. Since calculating a fresh window size is cheap, do so all the
time instead of relying on a stale, cached value.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:58 +05:30
Sultan Alsawaf
1045128388 mm: vmpressure: Interpret zero scanned pages as 100% pressure
When no pages are scanned, it usually means no zones were reclaimable
and nothing could be done. In this case, the reported pressure should be
100 to elicit help from any listeners. This fixes the vmpressure
framework not working when memory pressure is very high.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:58 +05:30
Sultan Alsawaf
dce2339850 mm: vmpressure: Don't exclude any allocation types
Although userspace processes can't directly help with kernel memory
pressure, killing userspace processes can relieve kernel memory if they
are responsible for that pressure in the first place. It doesn't make
sense to exclude any allocation types knowing that userspace can indeed
affect all memory pressure, so don't exclude any allocation types from
the pressure calculations.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:57 +05:30
Sultan Alsawaf
14d8415179 simple_lmk: Update adj targeting for Android 10
Android 10 changed its adj assignments. Update Simple LMK to use the
new adjs, which also requires looking at each pair of adjs as a range.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:57 +05:30
Sultan Alsawaf
25c0b1fa3a simple_lmk: Use vmpressure notifier to trigger kills
Using kswapd's scan depth to trigger task kills is inconsistent and
unreliable. When memory pressure quickly spikes, the kswapd scan depth
trigger fails to kick off Simple LMK fast enough, causing severe lag.
Additionally, kswapd could stop scanning prematurely before reaching the
desired scan depth to trigger Simple LMK, which could also cause stalls.

To remedy this, use the vmpressure framework instead, since it provides
more consistent and accurate readings on memory pressure. This is not
very tunable though, so remove CONFIG_ANDROID_SIMPLE_LMK_AGGRESSION.
Triggering Simple LMK to kill when the reported memory pressure is 100
should yield good results on all setups.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:56 +05:30
Sultan Alsawaf
e2422dab59 mm: Stop kswapd early when nothing's waiting for it to free pages
Keeping kswapd running when all the failed allocations that invoked it
are satisfied incurs a high overhead due to unnecessary page eviction
and writeback, as well as spurious VM pressure events to various
registered shrinkers. When kswapd doesn't need to work to make an
allocation succeed anymore, stop it prematurely to save resources.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:56 +05:30
Sultan Alsawaf
7e69526dcc simple_lmk: Include swap memory usage in the size of victims
Swap memory usage is important when determining what to kill, so include
it in the victim size calculation.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:55 +05:30
Sultan Alsawaf
7d01604580 simple_lmk: Relax memory barriers and clean up some styling
wake_up() executes a full memory barrier when waking a process up, so
there's no need for the acquire in the wait event. Additionally,
because of this, the atomic_cmpxchg() only needs a read barrier.

The cmpxchg() in simple_lmk_mm_freed() is atomic when it doesn't need to
be, so replace it with an extra line of code.

The atomic_inc_return() in simple_lmk_mm_freed() lies within a lock, so
it doesn't need explicit memory barriers.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:55 +05:30
Sultan Alsawaf
abe50fd0a1 simple_lmk: Place victims onto SCHED_RR
Just increasing the victim's priority to the maximum niceness isn't
enough to make it totally preempt everything in SCHED_FAIR, which is
important to make sure victims die quickly. Resource-wise, this isn't
very burdensome since the RT priority is just set to zero, and because
dying victims don't have much to do: they only need to finish whatever
they're doing quickly. SCHED_RR is used over SCHED_FIFO so that CPU time
between the victims is divided evenly to help them all finish at around
the same time, as fast as possible.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:54 +05:30
Sultan Alsawaf
600a4bac0c simple_lmk: Add a timeout to stop waiting for victims to die
Simple LMK tries to wait until all of the victims it kills have their
memory freed; however, sometimes victims can take a while to die, which
can block Simple LMK from killing more processes in time when needed.
After the specified timeout elapses, Simple LMK will stop waiting and
make itself available to kill more processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:54 +05:30
Sultan Alsawaf
8e631a10d0 simple_lmk: Ignore tasks that won't free memory
Dying processes aren't going to help free memory, so ignore them.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:53 +05:30
Sultan Alsawaf
89c6220d88 simple_lmk: Simplify tricks used to speed up the death process
set_user_nice() doesn't schedule, and although set_cpus_allowed_ptr()
can schedule, it will only do so when the specified task cannot run on
the new set of allowed CPUs. Since cpu_all_mask is used,
set_cpus_allowed_ptr() will never schedule. Therefore, both the priority
elevation and cpus_allowed change can be moved to inside the task lock
to simplify and speed things up.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:53 +05:30
Sultan Alsawaf
9108a4bb37 simple_lmk: Report mm as freed as soon as exit_mmap() finishes
exit_mmap() is responsible for freeing the vast majority of an mm's
memory; in order to unblock Simple LMK faster, report an mm as freed as
soon as exit_mmap() finishes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:52 +05:30
Sultan Alsawaf
c6745e63cb simple_lmk: Mark victim thread group with TIF_MEMDIE
The OOM killer sets the TIF_MEMDIE thread flag for its victims to alert
other kernel code that the current process was killed due to memory
pressure, and needs to finish whatever it's doing quickly. In the page
allocator this allows victim processes to quickly allocate memory using
emergency reserves. This is especially important when memory pressure is
high; if all processes are taking a while to allocate memory, then our
victim processes will face the same problem and can potentially get
stuck in the page allocator for a while rather than die expeditiously.

To ensure that victim processes die quickly, set TIF_MEMDIE for the
entire victim thread group.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:52 +05:30
Sultan Alsawaf
364a725995 simple_lmk: Disable OOM killer when Simple LMK is enabled
The OOM killer only serves to be a liability when Simple LMK is used.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:51 +05:30
Sultan Alsawaf
d5abaffa7e simple_lmk: Print a message when there are no processes to kill
Makes it clear that Simple LMK tried its best but there was nothing it
could do.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:51 +05:30
Sultan Alsawaf
1162ff26e0 simple_lmk: Remove compat cruft not specific to 4.14
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:50 +05:30
Sultan Alsawaf
8ee9570a69 simple_lmk: Update copyright to 2020
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:50 +05:30
Sultan Alsawaf
99454494ca simple_lmk: Don't queue up new reclaim requests during reclaim
Queuing up reclaim requests while a reclaim is in progress doesn't make
sense, since the additional reclaims may not be needed after the
existing reclaim completes. This would cause Simple LMK to go berserk
during periods of high memory pressure where kswapd would fire off
reclaim requests nonstop.

Make Simple LMK ignore new reclaim requests until an existing reclaim is
finished to prevent a slaughter-fest.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:49 +05:30
Sultan Alsawaf
cbef7a1e44 simple_lmk: Increase default minfree value
After commit "simple_lmk: Make reclaim deterministic", Simple LMK's
behavior changed and thus requires some slight re-tuning to make it work
well again.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:49 +05:30
Sultan Alsawaf
90bdfa97bc simple_lmk: Clean up some code style nitpicks
Using a parameter to pass around a unmodified pointer to a global
variable is crufty; just use the `victims` variable directly instead.
Also, compress the code in simple_lmk_init_set() a bit to make it look
cleaner.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:48 +05:30
Sultan Alsawaf
a5218db0c0 simple_lmk: Make reclaim deterministic
The 20 ms delay in the reclaim thread is a hacky fudge factor that can
cause Simple LMK to behave wildly differently depending on the
circumstances of when it is invoked. When kswapd doesn't get enough CPU
time to finish up and go back to sleep within 20 ms, Simple LMK performs
superfluous reclaims.

This is suboptimal, so make Simple LMK more deterministic by eliminating
the delay and instead queuing up reclaim requests from kswapd.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:48 +05:30
Sultan Alsawaf
7c119f20e8 simple_lmk: Fix broken multicopy atomicity for victims_to_kill
When the reclaim thread writes to victims_to_kill on one CPU, it expects
the updated value to be immediately reflected on all CPUs in order for
simple_lmk_mm_freed() to work correctly. Due to the lack of memory
barriers to guarantee multicopy atomicity, simple_lmk_mm_freed() can be
given a victim's mm without knowing the correct victims_to_kill value,
which can cause the reclaim thread to remain stuck waiting forever for
all victims to be freed. This scenario, despite being rare, has been
observed.

Fix this by using proper atomic helpers with memory barriers.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:47 +05:30
Sultan Alsawaf
f43553f2dc simple_lmk: Use proper atomic_* operations where needed
cmpxchg() is only atomic with respect to the local CPU, so it cannot be
relied on with how it's used in Simple LMK. Switch to fully atomic
operations instead for full atomic guarantees.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:46 +05:30
Sultan Alsawaf
7ead0d4baa simple_lmk: Remove kthread_should_stop() exit condition
Simple LMK's reclaim thread should never stop; there's no need to have
this check.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:46 +05:30
Sultan Alsawaf
73a9c99308 simple_lmk: Fix pages_found calculation
Previously, pages_found would be calculated using an uninitialized
variable. Fix it.

Reported-by: Julian Liu <wlootlxt123@gmail.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:45 +05:30
Sultan Alsawaf
0d01445a6a simple_lmk: Introduce Simple Low Memory Killer for Android
This is a complete low memory killer solution for Android that is small
and simple. Processes are killed according to the priorities that
Android gives them, so that the least important processes are always
killed first. Processes are killed until memory deficits are satisfied,
as observed from kswapd struggling to free up pages. Simple LMK stops
killing processes when kswapd finally goes back to sleep.

The only tunables are the desired amount of memory to be freed per
reclaim event and desired frequency of reclaim events. Simple LMK tries
to free at least the desired amount of memory per reclaim and waits
until all of its victims' memory is freed before proceeding to kill more
processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-11-26 01:40:45 +05:30
RoHaN
475976a7bd atoll_defconfig: Enable SHA512 and CRC32 crypto extensions
Signed-off-by: RoHaN <reaper10x10x@gmail.com>
2025-11-26 01:40:44 +05:30
Alexander Winkowski
1276c45af6 atoll_defconfig: Disable memory cgroups support
Memory cgroups introduce high overhead. In Android 11 Google
recommends to use them for low-RAM devices only [1].

[1] https://source.android.com/devices/tech/perf/lmkd#userspace-lmkd-in-android-r

Change-Id: I3914b9fb3fc7f88450ae82efc33117c5e2774af8
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2025-11-26 01:40:44 +05:30
Kyle Lin
588e69b802 atoll_defconfig: Enable CONFIG_COMPAT_VDSO
Test: build
Bug: 154245183
Change-Id: I7c32f542ed032de2cee298c982b4eabfd4ed0afe
Signed-off-by: Kyle Lin <kylelin@google.com>
2025-11-26 01:40:43 +05:30
Martin Liu
b5e127f550 atoll_defconfig: Disable BALANCE_ANON_FILE_RECLAIM
Disable QC customized config to align upstream behavior

Bug: 158449887
Test: boot
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: I015219b9be587fd4bd89adc6f15450569b842074
2025-11-26 01:40:43 +05:30
Danny Lin
1a4ec2efb8 atoll_defconfig: Disable redundant Spectre variant 2 mitigations
Our big and Prime clusters are currently getting software mitigations
for Spectre variant 2 (CVE-2017-5715) applied through Trusted Firmware
despite the presence of Arm v8.5-A hardware mitigations. Disable the
software mitigations since they're redundant and are only hurting
performance.

Details and analysis:

The Kryo cores used in the aforementioned clusters are semi-custom
Cortex-A76 derivatives [1]. According to Arm, newer revisions of their
reference Cortex-A76 designs (r3p0 and newer) are immune to Spectre v2
thanks to hardware mitigations implemented as part of Arm v8.5-A [2].

While I was unable to locate a working Spectre v2 PoC for AArch64, Arm's
overview suggests that the v2 and v3(a) mitigations come together as part
of the single Arm v8.5-A update [3], so we can test for whether the cores
are susceptible to v2 by testing for their susceptibility to v3 and/or
v3a. This is helpful because there *is* a public and working Spectre v3a
PoC for AArch64 on GitHub [4]. Running the PoC revealed no conclusive
successes for the v3 exploit, which should mean that our cores are also
not vulnerable to Spectre v2.

Variants 1 and 4 was not considered because Arm's documentation states
that v1 and v4 mitigations are completely unrelated to those for v2 [5].

All PoC runs were conducted within a regular Android app's context with
the app's processes locked to the big and Prime clusters (CPUs 4-7),
since Arm states that the little cluster's cores (Cortex-A55) are
not affected by any variants of Spectre [2].

[1] https://en.wikichip.org/wiki/qualcomm/snapdragon_800/855
[2] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Security%20update%2010%20September%2018/Kernel_Mitigations_Detail_v1.7.pdf?revision=730b8541-ca91-4fde-a2bb-4093054748ae
[3] https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability
[4] https://github.com/lgeek/spec_poc_arm
[5] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Security%20update%2010%20September%2018/Kernel_Mitigations_Detail_v1.7.pdf?revision=730b8541-ca91-4fde-a2bb-4093054748ae

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Change-Id: I4411899b7da9a7e1899ea7532e922c40bb077ab1
2025-11-26 01:40:42 +05:30
Juhyung Park
10afe7e1f1 atoll_defconfig: disable compression for pstore-ram
Compression doesn't allow any more additional log saves anyways.

Change-Id: I96bff9eedf3371853155850eeafefe5e30b3fe66
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2025-11-26 01:40:42 +05:30
Alexander Winkowski
26ebd7a13b atoll_defconfig: Disable errata
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2025-11-26 01:40:41 +05:30
Isaac J. Manjarres
af752d63bb atoll_defconfig: Disable ZONE_DMA
Disable the DMA zone on for faster memory allocations,
and better memory utilization by merging the memory consumed
by the DMA zone with the normal memory zone.

Change-Id: If3ecf649878ad70bc6045a2c619ef99054ed469d
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2025-11-26 01:40:41 +05:30
RoHaNRaJ
fc50840aa4 atoll_defconfig: Configure cpuset assist
Signed-off-by: RoHaNRaJ <reaPeR10x10x@gmail.com>
2025-11-26 01:40:40 +05:30
RoHaNRaJ
49b651b57b atoll_defconfig: Set schedutil rate limits
Based on 61e555768e

Signed-off-by: RoHaNRaJ <reaPeR10x10x@gmail.com>
2025-11-26 01:40:40 +05:30
Panchajanya1999
14d62630b7 arm64/defconfig: Disable PAN emulation
Since we aren't running a hardened kernel, there is no need of
PAN (Privileged Access Never) emulation.

Test: Run callbench and check results before and after the commit.

Before this commit:
syscall: 130ns
libc: 50ns
mmap: 17476 ns
read: 11054 ns

After this commit:
syscall: 121ns
libc: 46ns
mmap: 16235 ns
read: 6454 ns

There's a 7% syscall improvement and 7.8% libc improvement.

Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
Signed-off-by: reaPeR1010 <reaPeR10x10x@gmail.com>
2025-11-26 01:40:39 +05:30
Suren Baghdasaryan
aada462122 atoll_defconfig: Remove FAIR_GROUP_SCHED
This feature is undesirable and not required by Android.

Bug: 153203661
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8adeb2ab1cac3041c812bbab7907df6bac57ac6d
2025-11-26 01:40:39 +05:30
Pedro Bertoleti
8d920a7a0b arm64: configs: Disable per-process reclaim
Accordingly to FPS tests
(http://perf.mot.com/apps/fps/rungraphs/?runids=44588,44602,44598),
overall device performance has been improved when PPR is disabled (mainly
considering Major Page Fault and Swap IO improvements).

Change-Id: I89dade542cf47c754787fd17c11cc90e61d34473
Reviewed-on: https://gerrit.mot.com/2112482
SLTApproved: Slta Waiver
SME-Granted: SME Approvals Granted
Tested-by: Jira Key
Reviewed-by: Fernanda Schmidt <fschmidt@motorola.com>
Reviewed-by: Rafael Ortolan <rafones@motorola.com>
Reviewed-by: Carlos Pinho <cpinho@motorola.com>
Submit-Approved: Jira Key
2025-11-26 01:40:38 +05:30
RoHaNRaJ
c7695495d4 atoll_defconfig: Disable ESOC
Signed-off-by: RoHaNRaJ <reaPeR10x10x@gmail.com>
2025-11-26 01:40:38 +05:30
RoHaNRaJ
92338d8d4b atoll_defconfig: Enable DCE
Signed-off-by: RoHaNRaJ <reaPeR10x10x@gmail.com>
2025-11-26 01:40:37 +05:30
theshaenix
b17132e95a config: enable advanced TCP congestion control and set Westwood as default
This patch updates the atoll defconfig to enable additional TCP
congestion control algorithms and configure Westwood as the default
algorithm. These changes improve handling on variable mobile/WiFi
networks while keeping CUBIC and BBR available for optional use.

Enabled:
- CONFIG_TCP_CONG_ADVANCED
- CONFIG_TCP_CONG_CUBIC
- CONFIG_TCP_CONG_WESTWOOD
- CONFIG_TCP_CONG_BBR

Default:
- CONFIG_DEFAULT_WESTWOOD=y
- CONFIG_DEFAULT_TCP_CONG="westwood"
2025-11-26 01:40:37 +05:30
theshaenix
f2874d56b6 sm7125: updated the build script 2025-11-26 01:40:33 +05:30
Fiqri Ardyansyah
993ffc614e atoll__defconfig: Make lz4 default ZRAM compress
Signed-off-by: pri0818 <priyanshusinghal0818@gmail.com>
Signed-off-by: priiii0818 <priyanshusinghal0818@gmail.com>
Signed-off-by: GenZSouL <priyanshusinghal0818x@gmail.com>
2025-11-26 01:39:44 +05:30
Aarqw12
18c4acec3b dts: atoll : Fix comment gpu
caf have did a mistake and set /* NOM1 */ for 750mhz.

set /* TURBO */ for fix it.

fix commit b7272170103fa70a4d0b79500105c660e1a908fd

Signed-off-by: Aarqw12 <lcockx@protonmail.com>
Signed-off-by: negrroo <mohammedaelnaggar1@gmail.com>
2025-11-26 01:39:44 +05:30
Aarqw12
6bd812023d dts : atoll : Update TURBO & NOM_L1 bus voting for atoll GPU
update the GPU TURBO & NOM_L1 bus req voting to 2133 MHz

Signed-off-by: negrroo <mohammedaelnaggar1@gmail.com>
2025-11-26 01:39:23 +05:30
kurumich4n
61bc19d386 leds: qpnp-flash-v2: Bump DEFAULT_TORCH_STRENGTH 2025-11-22 18:05:30 +05:30
Flopster101
3646c384c0 ARM64: dts/qcom: pmi632: Enable realtime LED flash brightness control
Signed-off-by: Flopster101 <nahuelgomez329@gmail.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
2025-11-22 18:05:29 +05:30
Flopster101
b24af99baa leds: qpnp-flash-v2: Implement custom brightness control hack
* Intended to be compatible with caa6a682ee, but does not require opening a file from within the kernel (which was unsafe).
* It's independent from the standard `brightness` attribute that's present on all LED class drivers.
* Only for torch-class devices.

Signed-off-by: Flopster101 <nahuelgomez329@gmail.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
2025-11-22 18:05:29 +05:30
Kavya Nunna
0949eae442 leds:qpnp-flash-v2: Add support for flash realtime control
In the default design, every time we change the brightness
in torch node it gets reflected only when the enable bit
in the switch node is toggled from 0 to 1. Add new DT property
to optionally allow brightness change to reflect in realtime,
for as long as the switch node enable bit remains set.

Signed-off-by: Kavya Nunna <quic_knunna@quicinc.com>
Change-Id: If4cd4dbea611ab6c13ec8ac2ec16d6b39fed1e20
Signed-off-by: TogoFire <togofire@mailfence.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
2025-11-22 18:05:29 +05:30
LordShenron
3d06428fe1 fs: Gain 30% Linux Disk performance
NOATIME and NODIRATIME by default
Took from Kunal Kene's Black Box kernel
2025-11-22 18:05:28 +05:30
John Galt
279a29622f block/blk-throttle: tweak for flash
We're latency efficient enough for this tuning to benefit us.
2025-11-22 18:05:28 +05:30
jonghyun26.kim
247a50a3f9 power_supply: Fix unbalanced the power supplies
If a driver invokes multiple power_supply_register(), the each supply
will not be saved in the supplied_from[] with the correct index.

supplied_from[0] = "dc"
num_supplies = 1;

supplied_from[0] = "usb"
num_supplies = 2;

supplied_from[0] = "battery"
num_supplies = 3;
...

It results in NPE when iterating the supplied_from[] with num_supplies on
__power_supply_is_supplied_by()

Bug: 63785418
Change-Id: Ifd14ca7c6e2df247e1090e4fa8d8c66bd2912180
Signed-off-by; Devin Kim <dojip.kim@lge.com>
Signed-off-by: Steve Pfetsch <spfetsch@google.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: rezaadi0105 <rezaadipangestu5@gmail.com>
2025-11-22 18:05:27 +05:30
Pranav Vashi
a364a196c7 power: qpnp-qg: Rectify prop for shutdown threshold
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: rezaadi0105 <rezaadipangestu5@gmail.com>
2025-11-22 18:05:26 +05:30
Sultan Alsawaf
db9914f7c0 smb5-lib: Fix misleading indentation warning due to missing braces
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: Ic0463785333cce49458f402d97b8335d772e277f
Signed-off-by: rezaadi0105 <rezaadipangestu5@gmail.com>
2025-11-22 18:05:21 +05:30
Dakkshesh
2605f426a8 drivers/misc: Introduce KernelSpace Profile Modes
git-subtree-dir: drivers/misc/kprofiles
git-subtree-mainline: eb1715437701a8e9990cd2e35c1d1fd927ee30d5
git-subtree-split: 52df4eb2b8f1202ff74d3b5257c0d06946a165aa
Signed-off-by: pri0818 <priyanshusinghal0818@gmail.com>
Signed-off-by: priiii0818 <priyanshusinghal0818@gmail.com>
Signed-off-by: GenZSouL <priyanshusinghal0818x@gmail.com>
2025-09-21 23:21:16 +05:30
theshaenix
b8a667041f Revert "[SQUASH] drivers: Add KernelSU-Next V1.0.9 and SUSFS V1.5.9"
This reverts commit 6bd2b81a9c.
2025-09-13 21:33:34 +05:30
Alexander Winkowski
8bf50bc72f power: supply: qcom: Increase current for non-QC charging
It's useful for me because I have to use standard chargers frequently.

Change-Id: Iab8e7fb4a416908b8172aa5c41a217af7b96148e
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-13 21:33:33 +05:30
engstk
48152512a2 drivers: misc: implement usb fast charge mode
echo 0 /sys/kernel/fast_charge/force_fast_charge (disable)
echo 1 /sys/kernel/fast_charge/force_fast_charge (enable)

Enables force charging up to 900mA in usb mode

Signed-off-by: engstk <eng.stk@sapo.pt>
Signed-off-by: AnierinB <anierin@evolution-x.org>
2025-09-13 21:33:32 +05:30
aminfauzi
822db2d415 drivers: enable usb fast charge by default
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:33:28 +05:30
Rve27
b45d96444f power: smb5-lib: Implement bypass charging
Signed-off-by: Rve27 <rve27github@gmail.com>
2025-09-13 21:32:55 +05:30
aminfauzi
d016e7a079 configs: Enable EROFS
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:32:51 +05:30
Alexandre Frade
9cbac73dba block/mq-deadline: Optimize mq-deadline
* Disable front_merges by default
* Increase write priority to improve responsiveness

Signed-off-by: Alexandre Frade <kernel@xanmod.org>
2025-09-13 21:30:53 +05:30
Bart Van Assche
593ce9a47e loop: Select I/O scheduler 'none' from inside add_disk()
commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7 upstream.

We noticed that the user interface of Android devices becomes very slow
under memory pressure. This is because Android uses the zram driver on top
of the loop driver for swapping, because under memory pressure the swap
code alternates reads and writes quickly, because mq-deadline is the
default scheduler for loop devices and because mq-deadline delays writes by
five seconds for such a workload with default settings. Fix this by making
the kernel select I/O scheduler 'none' from inside add_disk() for loop
devices. This default can be overridden at any time from user space,
e.g. via a udev rule. This approach has an advantage compared to changing
the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
that synchronize_rcu() does not get called.

This patch changes the default I/O scheduler for loop devices from
'mq-deadline' into 'none'.

Additionally, this patch reduces the Android boot time on my test setup
with 0.5 seconds compared to configuring the loop I/O scheduler from user
space.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Martijn Coenen <maco@android.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-13 21:30:52 +05:30
Edwin Moquete
ba1747c6f4 media/platform/msm: Import 4.4 camera_v2 stack
From LA.UM.9.2.r1-03700-SDMxx0.0
2025-09-13 21:30:51 +05:30
Vijay kumar Tumati
e3b2ac7466 msm: camera: sensor: Add support for front aux sensor
Allow front aux sensor to be connected on device.

Change-Id: I0386c23c77b38200c20581cd85b20c96bf074547
Signed-off-by: Vijay kumar Tumati <vtumati@codeaurora.org>
2025-09-13 21:30:50 +05:30
aminfauzi
9d41c00c5c atoll: configs: CONFIG_HZ_300=y
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:30:50 +05:30
sohamxda7
ea6e0c961d qpnp-smb2: Silence dmesg spam while charging
* 'Set prop 16 is not supported in pc_port'

Signed-off-by: sohamxda7 <sensoham135@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: pix106 <sbordenave@gmail.com>
2025-09-13 21:30:49 +05:30
aminfauzi
c6999bb78e power: qcom: Force 900mA charging for USB2.0
Pzqqt: Port to smb-lib, and adapt to usb fast charge mode switch state
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:30:48 +05:30
Fiqri Ardyansyah
aa44d2b45b drivers: camera: Fix potential buffer overflows
This fixes the -Wfortify-source warning detected on Clang 18.0.0.

../drivers/media/platform/msm/camera/cam_utils/cam_io_util.c:270:4: warning: 'snprintf' will always be truncated; specified size is 12, but format string expands to at least 13 [-Wfortify-source]
  270 |                         snprintf(p_str, 12, "0x%08x: ",
      |                         ^
../drivers/media/platform/msm/camera/cam_utils/cam_io_util.c:275:3: warning: 'snprintf' will always be truncated; specified size is 10, but format string expands to at least 11 [-Wfortify-source]
  275 |                 snprintf(p_str, 10, "%08x  ", data);
      |                 ^
../drivers/media/platform/msm/camera/cam_sensor_module/cam_csiphy/cam_csiphy_soc.c:46:4: warning: 'snprintf' will always be truncated; specified size is 12, but format string expands to at least 13 [-Wfortify-source]
   46 |                         snprintf(p_str, 12, "0x%08x: ",
      |                         ^
../drivers/media/platform/msm/camera/cam_sensor_module/cam_csiphy/cam_csiphy_soc.c:51:3: warning: 'snprintf' will always be truncated; specified size is 9, but format string expands to at least 10 [-Wfortify-source]
   51 |                 snprintf(p_str, 9, "%08x ", data);
      |                 ^

Signed-off-by: Fiqri Ardyansyah <fiqri0927936@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-13 21:30:48 +05:30
Zhongqiu Han
9b4cc7e393 sched: idle: Optimize the generic idle loop by removing needless memory barrier
The memory barrier rmb() in generic idle loop do_idle() function is not
needed, it doesn't order any load instruction, just remove it as needless
rmb() can cause performance impact.

The rmb() was introduced by the tglx/history.git commit f2f1b44c75c4
("[PATCH] Remove RCU abuse in cpu_idle()") to order the loads between
cpu_idle_map and pm_idle. It pairs with wmb() in function cpu_idle_wait().

And then with the removal of cpu_idle_state in function cpu_idle() and
wmb() in function cpu_idle_wait() in commit 783e391b7b ("x86: Simplify
cpu_idle_wait"), rmb() no longer has a reason to exist.

After that, commit d166991234 ("idle: Implement generic idle function")
implemented a generic idle function cpu_idle_loop() which resembles the
functionality found in arch/. And it retained the rmb() in generic idle
loop in file kernel/cpu/idle.c.

And at last, commit cf37b6b484 ("sched/idle: Move cpu/idle.c to
sched/idle.c") moved cpu/idle.c to sched/idle.c. And commit c1de45ca83
("sched/idle: Add support for tasks that inject idle") renamed function
cpu_idle_loop() to do_idle().

History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241009093745.9504-1-quic_zhonhan@quicinc.com
Change-Id: I7d04d05f25b66ab266b66424dfddd58857e5242b
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-13 21:30:46 +05:30
Fiqri Ardyansyah
7e329b56df cpufreq: Simplify logic for enable powersave governor when battery saver is on
There's really no need to add a logical "if" and
just use the logical "or ( || )" operator.

Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
2025-09-13 21:30:46 +05:30
Fiqri Ardyansyah
57f6eed30c cpufreq: Force enable powersave governor when battery saver is on
This has a significant effect when power saver mode is active.

Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
2025-09-13 21:30:45 +05:30
aminfauzi
c86f938daf mm: reduce swappiness
Swappiness controls how aggressively the kernel swaps memory
from RAM. The default is 60% and a lower number is known to
improve system responsiveness.

Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:30:44 +05:30
aminfauzi
d10ec40258 block: Add mq-deadline I/O scheduler
The mq-deadline I/O scheduler is more stable than others. Additionally, the boot time is shorter with the mq-deadline I/O scheduler. It only changes the scheduler on new kernel.

Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:30:43 +05:30
aminfauzi
3bd708e084 fs: Active dynamic fsync by default
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-13 21:30:43 +05:30
theshaenix
613c232ba4 sched/fair: fix reweight_task() declaration and definition
The CFS scheduler was failing to build due to:

  error: implicit declaration of function 'reweight_task'
  error: conflicting types for 'reweight_task'
  error: too many arguments to function call, expected 3, have 4

This patch fixes it by:
- Adding a forward declaration for reweight_task()
- Defining reweight_task() with the correct signature
- Updating the call to reweight_entity() to pass 3 args instead of 4

With this change, fair.c builds cleanly again without implicit
declaration or argument mismatch errors.
2025-09-12 22:44:27 +05:30
theshaenix
d157bb9f45 bump the version_number to v1.0.2 2025-09-12 22:20:07 +05:30
theshaenix
977dc13522 sched/fair: bore: Implement sched_burst_exclude_kthreads
sched_burst_exclude_kthreads was introduced in the newer version of
sched BORE. This feature is trivial to implement but is a handy one.
2025-09-12 21:08:17 +05:30
theshaenix
930f34e47e sched/fair: bore: sched_burst_fork_atavistic = 0 2025-09-12 21:04:43 +05:30
theshaenix
8647662867 sched/fair: bore: sched_burst_smoothness_short = 1 2025-09-12 21:03:59 +05:30
theshaenix
789c824ffb sched: Implement Burst-Oriented Response Enhancer scheduler
Ref: https://github.com/firelzrd/bore-scheduler
Patch: 0001-linux4.19.y-bore5.1.0.patch
2025-09-12 21:02:33 +05:30
theshaenix
7436d698e2 Revert "of/irq: Refer to actual buffer size in of_irq_parse_one()"
This reverts commit c5599e93ce.
2025-09-05 14:18:27 +05:30
theshaenix
bee6c9a29f Revert "of/irq: Support #msi-cells=<0> in of_msi_get_domain"
This reverts commit 6daa837812.
2025-09-05 14:17:59 +05:30
theshaenix
1b06c79588 Revert "ext4: no need to continue when the number of entries is 1"
This reverts commit 7445c15c98.
2025-09-05 14:14:50 +05:30
theshaenix
73a28f8eaa Revert "ext4: ext4_search_dir should return a proper error"
This reverts commit 989495abd7.
2025-09-05 14:14:10 +05:30
theshaenix
5b3f77fcf4 zram: Use lz4 as default zRAM compression
Faster comression using lz4. zram optimization
2025-09-05 13:56:51 +05:30
theshaenix
ceba8fed0a configs: Enable Sbalance
Exclude the last CPU of each cluster from balancing in order to keep the
maximum amount of single-threaded performance available to the system per
each cluster. That way, when IRQ pressure is high, each cluster will have
at least one CPU which isn't affected by IRQ pressure.
2025-09-05 13:43:48 +05:30
Sultan Alsawaf
8fc3c8270f kernel: Introduce SBalance IRQ balancer
This is a simple IRQ balancer that polls every X number of milliseconds and
moves IRQs from the most interrupt-heavy CPU to the least interrupt-heavy
CPUs until the heaviest CPU is no longer the heaviest. IRQs are only moved
from one source CPU to any number of destination CPUs per balance run.
Balancing is skipped if the gap between the most interrupt-heavy CPU and
the least interrupt-heavy CPU is below the configured threshold of
interrupts.

The heaviest IRQs are targeted for migration in order to reduce the number
of IRQs to migrate. If moving an IRQ would reduce overall balance, then it
won't be migrated.

The most interrupt-heavy CPU is calculated by scaling the number of new
interrupts on that CPU to the CPU's current capacity. This way, interrupt
heaviness takes into account factors such as thermal pressure and time
spent processing interrupts rather than just the sheer number of them. This
also makes SBalance aware of CPU asymmetry, where different CPUs can have
different performance capacities and be proportionally balanced.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
2025-09-05 13:43:27 +05:30
theshaenix
e7bfc757b0 configs: enable wireguard support 2025-09-05 13:41:16 +05:30
Tashfin Shakeer Rhythm
8663061c1e treewide: Drop arm64 V8 ASM lz4 decompression acceleration
It will be reimported after upstreaming lz4 to v1.10.0. Drop for
now to avoid conflicts.

This reverts the following commits:

 - 255efc520cad2 ("lib/lz4: Use ARM64 v8 ASM to accelerate lz4 decompression")
 - fe3b46fc49a5c ("incfs: Use ARM64 v8 ASM to accelerate lz4 decompression")
 - ef84deed4fc6c ("crypto: lz4: Use ARM64 v8 ASM to accelerate decompression")
 - 5ff394e7850e6 ("lz4armv8: Update assembly instructions from Huawei kernel drop")
 - 0e8a6f678ef99 ("lib/lz4: Import arm64 V8 ASM lz4 decompression acceleration")

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:49:31 +05:30
Tashfin Shakeer Rhythm
c1627aadd1 lib/lz4: Use ARM64 v8 ASM to accelerate lz4 decompression
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:49:17 +05:30
Dark-Matter7232
d897745bbe lz4armv8: Update assembly instructions from Huawei kernel drop
Signed-off-by: Dark-Matter7232 <me@const.eu.org>
[Tashar02: Fragment from original commit, improve indentations and reword commit message]
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:49:09 +05:30
阿菌•未霜
d8e0f8c4cd lib/lz4: Import arm64 V8 ASM lz4 decompression acceleration
Change-Id: I3c8dd91df090bb692784a6b7a61c8877b1e1dfba
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:49:02 +05:30
Tashfin Shakeer Rhythm
517a6739a5 lz4: Eliminate unused functions
This fixes the following warnings by Clang:

../lib/lz4/lz4_decompress.c:904:12: warning: unused function 'LZ4_decompress_fast' [-Wunused-function]
static int LZ4_decompress_fast(const char *source, char *dest, int originalSize)
           ^
../lib/lz4/lz4_decompress.c:941:12: warning: unused function 'LZ4_decompress_fast_extDict' [-Wunused-function]
static int LZ4_decompress_fast_extDict(const char *source, char *dest,
           ^
../lib/lz4/lz4_decompress.c:1052:12: warning: unused function 'LZ4_decompress_fast_continue' [-Wunused-function]
static int LZ4_decompress_fast_continue(LZ4_streamDecode_t *LZ4_streamDecode,
           ^
../lib/lz4/lz4_decompress.c:1099:12: warning: unused function 'LZ4_decompress_safe_usingDict' [-Wunused-function]
static int LZ4_decompress_safe_usingDict(const char *source, char *dest,
           ^
../lib/lz4/lz4_decompress.c:1118:12: warning: unused function 'LZ4_decompress_fast_usingDict' [-Wunused-function]
static int LZ4_decompress_fast_usingDict(const char *source, char *dest,
           ^

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:48:53 +05:30
Tashfin Shakeer Rhythm
4c9b2110ec lz4: Staticize some functions
This fixes the following warnings by sparse:

../lib/lz4/lz4_compress.c:838:5: warning: symbol 'LZ4_compress_fast_extState' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:141:8: warning: symbol 'read_long_length_no_check' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:904:5: warning: symbol 'LZ4_decompress_fast' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1052:5: warning: symbol 'LZ4_decompress_fast_continue' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1099:5: warning: symbol 'LZ4_decompress_safe_usingDict' was not declared. Should it be static?
../lib/lz4/lz4_decompress.c:1118:5: warning: symbol 'LZ4_decompress_fast_usingDict' was not declared. Should it be static?

Since some of the functions have been marked as static now, there is no
need to export them. Remove the redundant export symbols as well.

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:48:43 +05:30
Andrzej Perczak
d5152bdc9d lz4: Update to version 1.9.4
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:48:31 +05:30
Andrzej Perczak
ee4cc22096 lib: Update LZ4 module to v1.9.3+
Update lz4 module using official repository from revision [1].

Keep in mind lz4hc wasn't updated thus it is not used. It may not
compile anymore.

[1]: 4ebe313e00

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:48:10 +05:30
Tashfin Shakeer Rhythm
8c3c34a74e lz4: Prepare for upgradation
We will update the lz4 module using its official repository.
The changes can collide with our modifications and backports.
Make some way for new codes and that's why, revert a few patches.

This reverts the following commits:

ad28920b5ab5 ("BACKPORT: lz4: fix LZ4_decompress_safe_partial read out of bound")
7151d98737e1 ("lib/lz4/lz4_decompress.c: document deliberate use of `&'")
5d136a311f8e ("lib/lz4: explicitly support in-place decompression")

Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:47:59 +05:30
Tiezhu Yang
2df3576b51 lib: make LZ4_decompress_safe_forceExtDict() static
LZ4_decompress_safe_forceExtDict() is only used in
lib/lz4/lz4_decompress.c, make it static to fix the build warning about
"no previous prototype" [1].

[1] https://lore.kernel.org/lkml/202206260948.akgsho1q-lkp@intel.com/

Link: https://lkml.kernel.org/r/1656298965-8698-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:47:49 +05:30
Rajat Asthana
60c57ed4e9 lz4_decompress: declare LZ4_decompress_safe_withPrefix64k static
Declare LZ4_decompress_safe_withPrefix64k as static to fix sparse
warning:

> warning: symbol 'LZ4_decompress_safe_withPrefix64k' was not declared.
> Should it be static?

Link: https://lkml.kernel.org/r/20210511154345.610569-1-thisisrast7@gmail.com
Signed-off-by: Rajat Asthana <thisisrast7@gmail.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Cc: Gao Xiang <hsiangkao@redhat.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:47:36 +05:30
Nick Terrell
08b2279396 lz4: fix kernel decompression speed
This patch replaces all memcpy() calls with LZ4_memcpy() which calls
__builtin_memcpy() so the compiler can inline it.

LZ4 relies heavily on memcpy() with a constant size being inlined.  In x86
and i386 pre-boot environments memcpy() cannot be inlined because memcpy()
doesn't get defined as __builtin_memcpy().

An equivalent patch has been applied upstream so that the next import
won't lose this change [1].

I've measured the kernel decompression speed using QEMU before and after
this patch for the x86_64 and i386 architectures.  The speed-up is about
10x as shown below.

Code	Arch	Kernel Size	Time	Speed
v5.8	x86_64	11504832 B	148 ms	 79 MB/s
patch	x86_64	11503872 B	 13 ms	885 MB/s
v5.8	i386	 9621216 B	 91 ms	106 MB/s
patch	i386	 9620224 B	 10 ms	962 MB/s

I also measured the time to decompress the initramfs on x86_64, i386, and
arm.  All three show the same decompression speed before and after, as
expected.

[1] https://github.com/lz4/lz4/pull/890

Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Link: http://lkml.kernel.org/r/20200803194022.2966806-1-nickrterrell@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Change-Id: Ia1fae7b16e6428f5a2f5d53311b74e1ca71f61f1
Signed-off-by: Divyanshu-Modi <divyan.m05@gmail.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2025-09-05 12:47:25 +05:30
Guo Xuenan
ed5bafe35e lz4: fix LZ4_decompress_safe_partial read out of bound
commit eafc0a02391b7b36617b36c97c4b5d6832cf5e24 upstream.

When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.

In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur.  As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding.  lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.

current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.

[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] c5d6f8a8be#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/

Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-05 12:47:13 +05:30
Gao Xiang
ded327a439 lib/lz4: explicitly support in-place decompression
commit 89b158635ad79574bde8e94d45dad33f8cf09549 upstream.

LZ4 final literal copy could be overlapped when doing
in-place decompression, so it's unsafe to just use memcpy()
on an optimized memcpy approach but memmove() instead.

Upstream LZ4 has updated this years ago [1] (and the impact
is non-sensible [2] plus only a few bytes remain), this commit
just synchronizes LZ4 upstream code to the kernel side as well.

It can be observed as EROFS in-place decompression failure
on specific files when X86_FEATURE_ERMS is unsupported,
memcpy() optimization of commit 59daa706fb ("x86, mem:
Optimize memcpy by avoiding memory false dependece") will
be enabled then.

Currently most modern x86-CPUs support ERMS, these CPUs just
use "rep movsb" approach so no problem at all. However, it can
still be verified with forcely disabling ERMS feature...

arch/x86/lib/memcpy_64.S:
        ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
-                     "jmp memcpy_erms", X86_FEATURE_ERMS
+                     "jmp memcpy_orig", X86_FEATURE_ERMS

We didn't observe any strange on arm64/arm/x86 platform before
since most memcpy() would behave in an increasing address order
("copy upwards" [3]) and it's the correct order of in-place
decompression but it really needs an update to memmove() for sure
considering it's an undefined behavior according to the standard
and some unique optimization already exists in the kernel.

[1] 33cb8518ac
[2] https://github.com/lz4/lz4/pull/717#issuecomment-497818921
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=12518

Link: https://lkml.kernel.org/r/20201122030749.2698994-1-hsiangkao@redhat.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-05 12:47:03 +05:30
Linus Torvalds
060b8717b9 lz4: do not export static symbol
Kbuild now complains (rightly) about it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 12:46:53 +05:30
Gao Xiang
9784e48adf lib/lz4: update LZ4 decompressor module
Update the LZ4 compression module based on LZ4 v1.8.3 in order for the
erofs file system to use the newest LZ4_decompress_safe_partial() which
can now decode exactly the nb of bytes requested [1] to take place of the
open hacked code in the erofs file system itself.

Currently, apart from the erofs file system, no other users use
LZ4_decompress_safe_partial, so no worry about the interface.

In addition, LZ4 v1.8.x boosts up decompression speed compared to the
current code which is based on LZ4 v1.7.3, mainly due to shortcut
optimization for the specific common LZ4-sequences [2].

lzbench testdata (tested in kirin710, 8 cores, 4 big cores
at 2189Mhz, 2GB DDR RAM at 1622Mhz, with enwik8 testdata [3]):

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
memcpy                   5004 MB/s  4924 MB/s   100000000 100.00 enwik8
lz4hc 1.7.3 -9             12 MB/s   653 MB/s    42203253  42.20 enwik8
lz4hc 1.8.0 -9             12 MB/s   908 MB/s    42203096  42.20 enwik8
lz4hc 1.8.3 -9             11 MB/s   965 MB/s    42203094  42.20 enwik8

[1] https://github.com/lz4/lz4/issues/566
    08d347b5b2

[2] v1.8.1 perf: slightly faster compression and decompression speed
    a31b7058cb
    v1.8.2 perf: slightly faster HC compression and decompression speed
    45f8603aae
    1a191b3f8d

[3] http://mattmahoney.net/dc/textdata.html
    http://mattmahoney.net/dc/enwik8.zip

Link: http://lkml.kernel.org/r/1537181207-21932-1-git-send-email-gaoxiang25@huawei.com
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Tested-by: Guo Xuenan <guoxuenan@huawei.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Fang Wei <fangwei1@huawei.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: <weidu.du@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 12:46:41 +05:30
Michael Bestas
28c40e5243 power: qpnp-smb5: Implement charging_enabled node
Change-Id: I0abc9ac4c32a5067e65c2650eb204b9eca437afa
2025-09-05 12:44:33 +05:30
Kuba Wojciechowski
7d6b87d910 power/supply: qpnp-smb5: Report fast charging when a proprietary charger is attached
Android figures out if charging is "rapid" by checking
POWER_SUPPLY_PROP_CURRENT_MAX and when the proprietary charger tech
Xiaomi's using is active that value isn't always reported correctly.
Work around that by reporting an arbitrary value that's high enough to
qualify as "rapid" when a proprietary charger is attached and fully
authenticated (by checking smblib_get_fastcharge_mode() which is a
custom utility function added by Xiaomi). Other charger types (HVDCP/PD)
still use standard smblib_get_prop_input_current_max().

Test: original 33W charger is reported as "rapid" immediately after
plugging it in, a slow charger is still detected as "slow".

Signed-off-by: Kuba Wojciechowski <nullbytepl@gmail.com>
Change-Id: If9247081f2eae8132857be44487f83ba36b4c129
Signed-off-by: Hazama25 <hazamafawkes@gmail.com>
2025-09-05 12:44:12 +05:30
theshaenix
890381f211 Revert "ext4: avoid OOB when system.data xattr changes underneath the filesystem"
This reverts commit 25fb52c992.
2025-09-05 12:14:33 +05:30
theshaenix
a637250586 Revert "ext4: return error on ext4_find_inline_entry"
This reverts commit 8226004cd2.
2025-09-05 12:10:53 +05:30
Forenche
66d2da785a wireguard: Update to version 1.0.20210606
Signed-off-by: Forenche <prahul2003@gmail.com>
2025-09-05 12:08:37 +05:30
theshaenix
a731860a92 Revert "net: add WireGuard from wireguard-linux-compat"
This reverts commit 0a2104ed48.
2025-09-05 12:08:12 +05:30
theshaenix
7aaa3b3e77 Revert "tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe"
This reverts commit e5b4018d59.
2025-09-05 12:06:32 +05:30
theshaenix
1326cde7c8 Revert "CDC-NCM: avoid overflow in sanity checking"
This reverts commit ccb319092c.
2025-09-05 12:04:34 +05:30
theshaenix
c4d394a9a8 The handshake between the driver and the kernel has been updated to the modern, safer standard
the final action—the actual function that the timer is supposed to execute—remains identical

Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-05 12:00:02 +05:30
Vegard Nossum
ffbb95faf0 LTS: Update to 4.14.356
This corresponds to 4.19.323 upstream (v4.19.322..v4.19.323).

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2025-09-05 11:48:09 +05:30
Ryusuke Konishi
08caf4dea7 nilfs2: fix kernel bug due to missing clearing of checked flag
commit 41e192ad2779cae0102879612dfe46726e4396aa upstream.

Syzbot reported that in directory operations after nilfs2 detects
filesystem corruption and degrades to read-only,
__block_write_begin_int(), which is called to prepare block writes, may
fail the BUG_ON check for accesses exceeding the folio/page size,
triggering a kernel bug.

This was found to be because the "checked" flag of a page/folio was not
cleared when it was discarded by nilfs2's own routine, which causes the
sanity check of directory entries to be skipped when the directory
page/folio is reloaded.  So, fix that.

This was necessary when the use of nilfs2's own page discard routine was
applied to more than just metadata files.

Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d6ca2daf692c7a82f959@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 994b2fa13a6c9cf3feca93090a9c337d48e3d60d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:09 +05:30
Edward Adam Davis
67bf56b4e6 ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
[ Upstream commit bc0a2f3a73fcdac651fca64df39306d1e5ebe3b0 ]

Syzbot reported a kernel BUG in ocfs2_truncate_inline.  There are two
reasons for this: first, the parameter value passed is greater than
ocfs2_max_inline_data_with_xattr, second, the start and end parameters of
ocfs2_truncate_inline are "unsigned int".

So, we need to add a sanity check for byte_start and byte_len right before
ocfs2_truncate_inline() in ocfs2_remove_inode_range(), if they are greater
than ocfs2_max_inline_data_with_xattr return -EINVAL.

Link: https://lkml.kernel.org/r/tencent_D48DB5122ADDAEDDD11918CFB68D93258C07@qq.com
Fixes: 1afc32b952 ("ocfs2: Write support for inline data")
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reported-by: syzbot+81092778aac03460d6b7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=81092778aac03460d6b7
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 27d95867bee806cdc448d122bd99f1d8b0544035)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:07 +05:30
Ryusuke Konishi
631772c5a7 nilfs2: fix potential deadlock with newly created symlinks
commit b3a033e3ecd3471248d474ef263aadc0059e516a upstream.

Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers
memory reclamation involving the filesystem layer, which can result in
circular lock dependencies among the reader/writer semaphore
nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the
fs_reclaim pseudo lock.

This is because after commit 21fc61c73c ("don't put symlink bodies in
pagecache into highmem"), the gfp flags of the page cache for symbolic
links are overwritten to GFP_KERNEL via inode_nohighmem().

This is not a problem for symlinks read from the backing device, because
the __GFP_FS flag is dropped after inode_nohighmem() is called.  However,
when a new symlink is created with nilfs_symlink(), the gfp flags remain
overwritten to GFP_KERNEL.  Then, memory allocation called from
page_symlink() etc.  triggers memory reclamation including the FS layer,
which may call nilfs_evict_inode() or nilfs_dirty_inode().  And these can
cause a deadlock if they are called while nilfs->ns_segctor_sem is held:

Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags
of newly created symlinks in the same way that nilfs_new_inode() and
__nilfs_read_inode() do, as a workaround until we adopt nofs allocation
scope consistently or improve the locking constraints.

Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com
Fixes: 21fc61c73c ("don't put symlink bodies in pagecache into highmem")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256
Tested-by: syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cc38c596e648575ce58bfc31623a6506eda4b94a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:06 +05:30
Ville Syrjälä
ebc675ec40 wifi: iwlegacy: Clear stale interrupts before resuming device
commit 07c90acb071b9954e1fecb1e4f4f13d12c544b34 upstream.

iwl4965 fails upon resume from hibernation on my laptop. The reason
seems to be a stale interrupt which isn't being cleared out before
interrupts are enabled. We end up with a race beween the resume
trying to bring things back up, and the restart work (queued form
the interrupt handler) trying to bring things down. Eventually
the whole thing blows up.

Fix the problem by clearing out any stale interrupts before
interrupts get enabled during resume.

Here's a debug log of the indicent:
[   12.042589] ieee80211 phy0: il_isr ISR inta 0x00000080, enabled 0xaa00008b, fh 0x00000000
[   12.042625] ieee80211 phy0: il4965_irq_tasklet inta 0x00000080, enabled 0x00000000, fh 0x00000000
[   12.042651] iwl4965 0000:10:00.0: RF_KILL bit toggled to enable radio.
[   12.042653] iwl4965 0000:10:00.0: On demand firmware reload
[   12.042690] ieee80211 phy0: il4965_irq_tasklet End inta 0x00000000, enabled 0xaa00008b, fh 0x00000000, flags 0x00000282
[   12.052207] ieee80211 phy0: il4965_mac_start enter
[   12.052212] ieee80211 phy0: il_prep_station Add STA to driver ID 31: ff:ff:ff:ff:ff:ff
[   12.052244] ieee80211 phy0: il4965_set_hw_ready hardware  ready
[   12.052324] ieee80211 phy0: il_apm_init Init card's basic functions
[   12.052348] ieee80211 phy0: il_apm_init L1 Enabled; Disabling L0S
[   12.055727] ieee80211 phy0: il4965_load_bsm Begin load bsm
[   12.056140] ieee80211 phy0: il4965_verify_bsm Begin verify bsm
[   12.058642] ieee80211 phy0: il4965_verify_bsm BSM bootstrap uCode image OK
[   12.058721] ieee80211 phy0: il4965_load_bsm BSM write complete, poll 1 iterations
[   12.058734] ieee80211 phy0: __il4965_up iwl4965 is coming up
[   12.058737] ieee80211 phy0: il4965_mac_start Start UP work done.
[   12.058757] ieee80211 phy0: __il4965_down iwl4965 is going down
[   12.058761] ieee80211 phy0: il_scan_cancel_timeout Scan cancel timeout
[   12.058762] ieee80211 phy0: il_do_scan_abort Not performing scan to abort
[   12.058765] ieee80211 phy0: il_clear_ucode_stations Clearing ucode stations in driver
[   12.058767] ieee80211 phy0: il_clear_ucode_stations No active stations found to be cleared
[   12.058819] ieee80211 phy0: _il_apm_stop Stop card, put in low power state
[   12.058827] ieee80211 phy0: _il_apm_stop_master stop master
[   12.058864] ieee80211 phy0: il4965_clear_free_frames 0 frames on pre-allocated heap on clear.
[   12.058869] ieee80211 phy0: Hardware restart was requested
[   16.132299] iwl4965 0000:10:00.0: START_ALIVE timeout after 4000ms.
[   16.132303] ------------[ cut here ]------------
[   16.132304] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
[   16.132338] WARNING: CPU: 0 PID: 181 at net/mac80211/util.c:1826 ieee80211_reconfig+0x8f/0x14b0 [mac80211]
[   16.132390] Modules linked in: ctr ccm sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc joydev mousedev btusb btrtl btintel btbcm bluetooth ecdh_generic ecc iTCO_wdt i2c_dev iwl4965 iwlegacy coretemp snd_hda_codec_analog pcspkr psmouse mac80211 snd_hda_codec_generic libarc4 sdhci_pci cqhci sha256_generic sdhci libsha256 firewire_ohci snd_hda_intel snd_intel_dspcfg mmc_core snd_hda_codec snd_hwdep firewire_core led_class iosf_mbi snd_hda_core uhci_hcd lpc_ich crc_itu_t cfg80211 ehci_pci ehci_hcd snd_pcm usbcore mfd_core rfkill snd_timer snd usb_common soundcore video parport_pc parport intel_agp wmi intel_gtt backlight e1000e agpgart evdev
[   16.132456] CPU: 0 UID: 0 PID: 181 Comm: kworker/u8:6 Not tainted 6.11.0-cl+ #143
[   16.132460] Hardware name: Hewlett-Packard HP Compaq 6910p/30BE, BIOS 68MCU Ver. F.19 07/06/2010
[   16.132463] Workqueue: async async_run_entry_fn
[   16.132469] RIP: 0010:ieee80211_reconfig+0x8f/0x14b0 [mac80211]
[   16.132501] Code: da 02 00 00 c6 83 ad 05 00 00 00 48 89 df e8 98 1b fc ff 85 c0 41 89 c7 0f 84 e9 02 00 00 48 c7 c7 a0 e6 48 a0 e8 d1 77 c4 e0 <0f> 0b eb 2d 84 c0 0f 85 8b 01 00 00 c6 87 ad 05 00 00 00 e8 69 1b
[   16.132504] RSP: 0018:ffffc9000029fcf0 EFLAGS: 00010282
[   16.132507] RAX: 0000000000000000 RBX: ffff8880072008e0 RCX: 0000000000000001
[   16.132509] RDX: ffffffff81f21a18 RSI: 0000000000000086 RDI: 0000000000000001
[   16.132510] RBP: ffff8880072003c0 R08: 0000000000000000 R09: 0000000000000003
[   16.132512] R10: 0000000000000000 R11: ffff88807e5b0000 R12: 0000000000000001
[   16.132514] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000ffffff92
[   16.132515] FS:  0000000000000000(0000) GS:ffff88807c200000(0000) knlGS:0000000000000000
[   16.132517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.132519] CR2: 000055dd43786c08 CR3: 000000000978f000 CR4: 00000000000006f0
[   16.132521] Call Trace:
[   16.132525]  <TASK>
[   16.132526]  ? __warn+0x77/0x120
[   16.132532]  ? ieee80211_reconfig+0x8f/0x14b0 [mac80211]
[   16.132564]  ? report_bug+0x15c/0x190
[   16.132568]  ? handle_bug+0x36/0x70
[   16.132571]  ? exc_invalid_op+0x13/0x60
[   16.132573]  ? asm_exc_invalid_op+0x16/0x20
[   16.132579]  ? ieee80211_reconfig+0x8f/0x14b0 [mac80211]
[   16.132611]  ? snd_hdac_bus_init_cmd_io+0x24/0x200 [snd_hda_core]
[   16.132617]  ? pick_eevdf+0x133/0x1c0
[   16.132622]  ? check_preempt_wakeup_fair+0x70/0x90
[   16.132626]  ? wakeup_preempt+0x4a/0x60
[   16.132628]  ? ttwu_do_activate.isra.0+0x5a/0x190
[   16.132632]  wiphy_resume+0x79/0x1a0 [cfg80211]
[   16.132675]  ? wiphy_suspend+0x2a0/0x2a0 [cfg80211]
[   16.132697]  dpm_run_callback+0x75/0x1b0
[   16.132703]  device_resume+0x97/0x200
[   16.132707]  async_resume+0x14/0x20
[   16.132711]  async_run_entry_fn+0x1b/0xa0
[   16.132714]  process_one_work+0x13d/0x350
[   16.132718]  worker_thread+0x2be/0x3d0
[   16.132722]  ? cancel_delayed_work_sync+0x70/0x70
[   16.132725]  kthread+0xc0/0xf0
[   16.132729]  ? kthread_park+0x80/0x80
[   16.132732]  ret_from_fork+0x28/0x40
[   16.132735]  ? kthread_park+0x80/0x80
[   16.132738]  ret_from_fork_asm+0x11/0x20
[   16.132741]  </TASK>
[   16.132742] ---[ end trace 0000000000000000 ]---
[   16.132930] ------------[ cut here ]------------
[   16.132932] WARNING: CPU: 0 PID: 181 at net/mac80211/driver-ops.c:41 drv_stop+0xe7/0xf0 [mac80211]
[   16.132957] Modules linked in: ctr ccm sch_fq_codel xt_tcpudp xt_multiport xt_state iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip_tables x_tables binfmt_misc joydev mousedev btusb btrtl btintel btbcm bluetooth ecdh_generic ecc iTCO_wdt i2c_dev iwl4965 iwlegacy coretemp snd_hda_codec_analog pcspkr psmouse mac80211 snd_hda_codec_generic libarc4 sdhci_pci cqhci sha256_generic sdhci libsha256 firewire_ohci snd_hda_intel snd_intel_dspcfg mmc_core snd_hda_codec snd_hwdep firewire_core led_class iosf_mbi snd_hda_core uhci_hcd lpc_ich crc_itu_t cfg80211 ehci_pci ehci_hcd snd_pcm usbcore mfd_core rfkill snd_timer snd usb_common soundcore video parport_pc parport intel_agp wmi intel_gtt backlight e1000e agpgart evdev
[   16.133014] CPU: 0 UID: 0 PID: 181 Comm: kworker/u8:6 Tainted: G        W          6.11.0-cl+ #143
[   16.133018] Tainted: [W]=WARN
[   16.133019] Hardware name: Hewlett-Packard HP Compaq 6910p/30BE, BIOS 68MCU Ver. F.19 07/06/2010
[   16.133021] Workqueue: async async_run_entry_fn
[   16.133025] RIP: 0010:drv_stop+0xe7/0xf0 [mac80211]
[   16.133048] Code: 48 85 c0 74 0e 48 8b 78 08 89 ea 48 89 de e8 e0 87 04 00 65 ff 0d d1 de c4 5f 0f 85 42 ff ff ff e8 be 52 c2 e0 e9 38 ff ff ff <0f> 0b 5b 5d c3 0f 1f 40 00 41 54 49 89 fc 55 53 48 89 f3 2e 2e 2e
[   16.133050] RSP: 0018:ffffc9000029fc50 EFLAGS: 00010246
[   16.133053] RAX: 0000000000000000 RBX: ffff8880072008e0 RCX: ffff88800377f6c0
[   16.133054] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8880072008e0
[   16.133056] RBP: 0000000000000000 R08: ffffffff81f238d8 R09: 0000000000000000
[   16.133058] R10: ffff8880080520f0 R11: 0000000000000000 R12: ffff888008051c60
[   16.133060] R13: ffff8880072008e0 R14: 0000000000000000 R15: ffff8880072011d8
[   16.133061] FS:  0000000000000000(0000) GS:ffff88807c200000(0000) knlGS:0000000000000000
[   16.133063] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.133065] CR2: 000055dd43786c08 CR3: 000000000978f000 CR4: 00000000000006f0
[   16.133067] Call Trace:
[   16.133069]  <TASK>
[   16.133070]  ? __warn+0x77/0x120
[   16.133075]  ? drv_stop+0xe7/0xf0 [mac80211]
[   16.133098]  ? report_bug+0x15c/0x190
[   16.133100]  ? handle_bug+0x36/0x70
[   16.133103]  ? exc_invalid_op+0x13/0x60
[   16.133105]  ? asm_exc_invalid_op+0x16/0x20
[   16.133109]  ? drv_stop+0xe7/0xf0 [mac80211]
[   16.133132]  ieee80211_do_stop+0x55a/0x810 [mac80211]
[   16.133161]  ? fq_codel_reset+0xa5/0xc0 [sch_fq_codel]
[   16.133164]  ieee80211_stop+0x4f/0x180 [mac80211]
[   16.133192]  __dev_close_many+0xa2/0x120
[   16.133195]  dev_close_many+0x90/0x150
[   16.133198]  dev_close+0x5d/0x80
[   16.133200]  cfg80211_shutdown_all_interfaces+0x40/0xe0 [cfg80211]
[   16.133223]  wiphy_resume+0xb2/0x1a0 [cfg80211]
[   16.133247]  ? wiphy_suspend+0x2a0/0x2a0 [cfg80211]
[   16.133269]  dpm_run_callback+0x75/0x1b0
[   16.133273]  device_resume+0x97/0x200
[   16.133277]  async_resume+0x14/0x20
[   16.133280]  async_run_entry_fn+0x1b/0xa0
[   16.133283]  process_one_work+0x13d/0x350
[   16.133287]  worker_thread+0x2be/0x3d0
[   16.133290]  ? cancel_delayed_work_sync+0x70/0x70
[   16.133294]  kthread+0xc0/0xf0
[   16.133296]  ? kthread_park+0x80/0x80
[   16.133299]  ret_from_fork+0x28/0x40
[   16.133302]  ? kthread_park+0x80/0x80
[   16.133304]  ret_from_fork_asm+0x11/0x20
[   16.133307]  </TASK>
[   16.133308] ---[ end trace 0000000000000000 ]---
[   16.133335] ieee80211 phy0: PM: dpm_run_callback(): wiphy_resume [cfg80211] returns -110
[   16.133360] ieee80211 phy0: PM: failed to restore async: error -110

Cc: stable@vger.kernel.org
Cc: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20241001200745.8276-1-ville.syrjala@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 271d282ecc15d7012e71ca82c89a6c0e13a063dd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:05 +05:30
Felix Fietkau
bf1b197729 wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower
commit 393b6bc174b0dd21bb2a36c13b36e62fc3474a23 upstream.

Avoid potentially crashing in the driver because of uninitialized private data

Fixes: 5b3dc42b1b ("mac80211: add support for driver tx power reporting")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/20241002095630.22431-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b0b862aa3dbcd16b3c4715259a825f48ca540088)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:04 +05:30
Greg Kroah-Hartman
c82ea4a5a4 Revert "driver core: Fix uevent_show() vs driver detach race"
commit 9a71892cbcdb9d1459c84f5a4c722b14354158a5 upstream.

This reverts commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c.

This commit causes a regression, so revert it for now until it can come
back in a way that works for everyone.

Link: https://lore.kernel.org/all/172790598832.1168608.4519484276671503678.stgit@dwillia2-xfh.jf.intel.com/
Fixes: 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race")
Cc: stable <stable@kernel.org>
Cc: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fe10c8367687c27172a10ba5cc849bd82077bd7d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:03 +05:30
Faisal Hassan
f3c8e25f88 xhci: Fix Link TRB DMA in command ring stopped completion event
commit 075919f6df5dd82ad0b1894898b315fbb3c29b84 upstream.

During the aborting of a command, the software receives a command
completion event for the command ring stopped, with the TRB pointing
to the next TRB after the aborted command.

If the command we abort is located just before the Link TRB in the
command ring, then during the 'command ring stopped' completion event,
the xHC gives the Link TRB in the event's cmd DMA, which causes a
mismatch in handling command completion event.

To address this situation, move the 'command ring stopped' completion
event check slightly earlier, since the specific command it stopped
on isn't of significant concern.

Fixes: 7f84eef0da ("USB: xhci: No-op command queueing and irq handler.")
Cc: stable@vger.kernel.org
Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com>
Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241022155631.1185-1-quic_faisalh@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d55d92597b7143f70e2db6108dac521d231ffa29)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:03 +05:30
Zijun Hu
9c952dbb5f usb: phy: Fix API devm_usb_put_phy() can not release the phy
commit fdce49b5da6e0fb6d077986dec3e90ef2b094b50 upstream.

For devm_usb_put_phy(), its comment says it needs to invoke usb_put_phy()
to release the phy, but it does not do that actually, so it can not fully
undo what the API devm_usb_get_phy() does, that is wrong, fixed by using
devres_release() instead of devres_destroy() within the API.

Fixes: cedf860237 ("usb: phy: move bulk of otg/otg.c to phy/phy.c")
Cc: stable@vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241020-usb_phy_fix-v1-1-7f79243b8e1e@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3a5693be9a47d368d39fee08325f5bf6cdd2ebaf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:45:02 +05:30
Dimitri Sivanich
15e22f6fde misc: sgi-gru: Don't disable preemption in GRU driver
[ Upstream commit b983b271662bd6104d429b0fd97af3333ba760bf ]

Disabling preemption in the GRU driver is unnecessary, and clashes with
sleeping locks in several code paths.  Remove preempt_disable and
preempt_enable from the GRU driver.

Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 88a0888162b375d79872fb1dece834bebea76fe3)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:59 +05:30
Daniel Palmer
0bf7136f8b net: amd: mvme147: Fix probe banner message
[ Upstream commit 82c5b53140faf89c31ea2b3a0985a2f291694169 ]

Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[    2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx

Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.

Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 34f2d9975aff5ddb9e15e4ddd58528c8fd570c4a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:20 +05:30
Pablo Neira Ayuso
a98297711a netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
[ Upstream commit d5953d680f7e96208c29ce4139a0e38de87a57fe ]

If access to offset + length is larger than the skbuff length, then
skb_checksum() triggers BUG_ON().

skb_checksum() internally subtracts the length parameter while iterating
over skbuff, BUG_ON(len) at the end of it checks that the expected
length to be included in the checksum calculation is fully consumed.

Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
Reported-by: Slavin Liu <slavin-ayu@qq.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a661ed364ae6ae88c2fafa9ddc27df1af2a73701)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:19 +05:30
Benoît Monin
94755e1c58 net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
[ Upstream commit 04c20a9356f283da623903e81e7c6d5df7e4dc3c ]

As documented in skbuff.h, devices with NETIF_F_IPV6_CSUM capability
can only checksum TCP and UDP over IPv6 if the IP header does not
contains extension.

This is enforced for UDP packets emitted from user-space to an IPv6
address as they go through ip6_make_skb(), which calls
__ip6_append_data() where a check is done on the header size before
setting CHECKSUM_PARTIAL.

But the introduction of UDP encapsulation with fou6 added a code-path
where it is possible to get an skb with a partial UDP checksum and an
IPv6 header with extension:
* fou6 adds a UDP header with a partial checksum if the inner packet
does not contains a valid checksum.
* ip6_tunnel adds an IPv6 header with a destination option extension
header if encap_limit is non-zero (the default value is 4).

The thread linked below describes in more details how to reproduce the
problem with GRE-in-UDP tunnel.

Add a check on the network header size in skb_csum_hwoffload_help() to
make sure no IPv6 packet with extension header is handed to a network
device with NETIF_F_IPV6_CSUM capability.

Link: https://lore.kernel.org/netdev/26548921.1r3eYUQgxm@benoit.monin/T/#u
Fixes: aa3463d65e ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: Benoît Monin <benoit.monin@gmx.fr>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/5fbeecfc311ea182aa1d1c771725ab8b4cac515e.1729778144.git.benoit.monin@gmx.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit bcefc3cd7f592a70fcbbbfd7ad1fbc69172ea78b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:18 +05:30
Xin Long
f3cd266b32 net: support ip generic csum processing in skb_csum_hwoffload_help
[ Upstream commit 62fafcd63139920eb25b3fbf154177ce3e6f3232 ]

NETIF_F_IP|IPV6_CSUM feature flag indicates UDP and TCP csum offload
while NETIF_F_HW_CSUM feature flag indicates ip generic csum offload
for HW, which includes not only for TCP/UDP csum, but also for other
protocols' csum like GRE's.

However, in skb_csum_hwoffload_help() it only checks features against
NETIF_F_CSUM_MASK(NETIF_F_HW|IP|IPV6_CSUM). So if it's a non TCP/UDP
packet and the features doesn't support NETIF_F_HW_CSUM, but supports
NETIF_F_IP|IPV6_CSUM only, it would still return 0 and leave the HW
to do csum.

This patch is to support ip generic csum processing by checking
NETIF_F_HW_CSUM for all protocols, and check (NETIF_F_IP_CSUM |
NETIF_F_IPV6_CSUM) only for TCP and UDP.

Note that we're using skb->csum_offset to check if it's a TCP/UDP
proctol, this might be fragile. However, as Alex said, for now we
only have a few L4 protocols that are requesting Tx csum offload,
we'd better fix this until a new protocol comes with a same csum
offset.

v1->v2:
  - not extend skb->csum_not_inet, but use skb->csum_offset to tell
    if it's an UDP/TCP csum packet.
v2->v3:
  - add a note in the changelog, as Willem suggested.

Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 04c20a9356f2 ("net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2c88668d57735d4ff65ce35747c8aa6662cc5013)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:17 +05:30
Pedro Tammela
08099a62d0 net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT
[ Upstream commit 2e95c4384438adeaa772caa560244b1a2efef816 ]

In qdisc_tree_reduce_backlog, Qdiscs with major handle ffff: are assumed
to be either root or ingress. This assumption is bogus since it's valid
to create egress qdiscs with major handle ffff:
Budimir Markovic found that for qdiscs like DRR that maintain an active
class list, it will cause a UAF with a dangling class pointer.

In 066a3b5b23, the concern was to avoid iterating over the ingress
qdisc since its parent is itself. The proper fix is to stop when parent
TC_H_ROOT is reached because the only way to retrieve ingress is when a
hierarchy which does not contain a ffff: major handle call into
qdisc_lookup with TC_H_MAJ(TC_H_ROOT).

In the scenario where major ffff: is an egress qdisc in any of the tree
levels, the updates will also propagate to TC_H_ROOT, which then the
iteration must stop.

Fixes: 066a3b5b23 ("[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop")
Reported-by: Budimir Markovic <markovicbudimir@gmail.com>
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

 net/sched/sch_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Simon Horman <horms@kernel.org>

Link: https://patch.msgid.link/20241024165547.418570-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e7f9a6f97eb067599a74f3bcb6761976b0ed303e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:16 +05:30
Pablo Neira Ayuso
a42155316f gtp: allow -1 to be specified as file description from userspace
[ Upstream commit 7515e37bce5c428a56a9b04ea7e96b3f53f17150 ]

Existing user space applications maintained by the Osmocom project are
breaking since a recent fix that addresses incorrect error checking.

Restore operation for user space programs that specify -1 as file
descriptor to skip GTPv0 or GTPv1 only sockets.

Fixes: defd8b3c37b0 ("gtp: fix a potential NULL pointer dereference")
Reported-by: Pau Espin Pedrol <pespin@sysmocom.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Oliver Smith <osmith@sysmocom.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241022144825.66740-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 63d8172188c759c44cae7a57eece140e0b90a2e1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:14 +05:30
Christophe JAILLET
622db38088 gtp: simplify error handling code in 'gtp_encap_enable()'
[ Upstream commit b289ba5e07105548b8219695e5443d807a825eb8 ]

'gtp_encap_disable_sock(sk)' handles the case where sk is NULL, so there
is no need to test it before calling the function.

This saves a few line of code.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 7515e37bce5c ("gtp: allow -1 to be specified as file description from userspace")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 66f635f6ae87c35bd1bda16927e9393cacd05ee4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:13 +05:30
Felix Fietkau
8279b73dd6 wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys
[ Upstream commit 52009b419355195912a628d0a9847922e90c348c ]

Sync iterator conditions with ieee80211_iter_keys_rcu.

Fixes: 830af02f24 ("mac80211: allow driver to iterate keys")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://patch.msgid.link/20241006153630.87885-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c9cf9510970e5b33e5bc21377380f1cf61685ed0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:12 +05:30
Xiu Jianfeng
c917edc9a0 cgroup: Fix potential overflow issue when checking max_depth
[ Upstream commit 3cc4e13bb1617f6a13e5e6882465984148743cf4 ]

cgroup.max.depth is the maximum allowed descent depth below the current
cgroup. If the actual descent depth is equal or larger, an attempt to
create a new child cgroup will fail. However due to the cgroup->max_depth
is of int type and having the default value INT_MAX, the condition
'level > cgroup->max_depth' will never be satisfied, and it will cause
an overflow of the level after it reaches to INT_MAX.

Fix it by starting the level from 0 and using '>=' instead.

It's worth mentioning that this issue is unlikely to occur in reality,
as it's impossible to have a depth of INT_MAX hierarchy, but should be
be avoided logically.

Fixes: 1a926e0bba ("cgroup: implement hierarchy limits")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 339df130db47ae7e89fddce5729b0f0566405d1d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:11 +05:30
Sabrina Dubroca
77a20d4104 xfrm: validate new SA's prefixlen using SA family when sel.family is unset
[ Upstream commit 3f0ab59e6537c6a8f9e1b355b48f9c05a76e8563 ]

This expands the validation introduced in commit 07bf7908950a ("xfrm:
Validate address prefix lengths in the xfrm selector.")

syzbot created an SA with
    usersa.sel.family = AF_UNSPEC
    usersa.sel.prefixlen_s = 128
    usersa.family = AF_INET

Because of the AF_UNSPEC selector, verify_newsa_info doesn't put
limits on prefixlen_{s,d}. But then copy_from_user_state sets
x->sel.family to usersa.family (AF_INET). Do the same conversion in
verify_newsa_info before validating prefixlen_{s,d}, since that's how
prefixlen is going to be used later on.

Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f31398570acf0f0804c644006f7bfa9067106b0a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:10 +05:30
junhua huang
86d7a78dc2 arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
commit ef08c0fadd8a17ebe429b85e23952dac3263ad34 upstream.

After we fixed the uprobe inst endian in aarch_be, the sparse check report
the following warning info:

sparse warnings: (new ones prefixed by >>)
>> kernel/events/uprobes.c:223:25: sparse: sparse: restricted __le32 degrades to integer
>> kernel/events/uprobes.c:574:56: sparse: sparse: incorrect type in argument 4 (different base types)
@@     expected unsigned int [addressable] [usertype] opcode @@     got restricted __le32 [usertype] @@
   kernel/events/uprobes.c:574:56: sparse:     expected unsigned int [addressable] [usertype] opcode
   kernel/events/uprobes.c:574:56: sparse:     got restricted __le32 [usertype]
>> kernel/events/uprobes.c:1483:32: sparse: sparse: incorrect type in initializer (different base types)
@@     expected unsigned int [usertype] insn @@     got restricted __le32 [usertype] @@
   kernel/events/uprobes.c:1483:32: sparse:     expected unsigned int [usertype] insn
   kernel/events/uprobes.c:1483:32: sparse:     got restricted __le32 [usertype]

use the __le32 to u32 for uprobe_opcode_t, to keep the same.

Fixes: 60f07e22a73d ("arm64:uprobe fix the uprobe SWBP_INSN in big-endian")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: junhua huang <huang.junhua@zte.com.cn>
Link: https://lore.kernel.org/r/202212280954121197626@zte.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 974955b61fe226c0d837106738fc0fb5910d67a8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:09 +05:30
Ryusuke Konishi
926620cdb3 nilfs2: fix kernel bug due to missing clearing of buffer delay flag
commit 6ed469df0bfbef3e4b44fca954a781919db9f7ab upstream.

Syzbot reported that after nilfs2 reads a corrupted file system image
and degrades to read-only, the BUG_ON check for the buffer delay flag
in submit_bh_wbc() may fail, causing a kernel bug.

This is because the buffer delay flag is not cleared when clearing the
buffer state flags to discard a page/folio or a buffer head. So, fix
this.

This became necessary when the use of nilfs2's own page clear routine
was expanded.  This state inconsistency does not occur if the buffer
is written normally by log writing.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Link: https://lore.kernel.org/r/20241015213300.7114-1-konishi.ryusuke@gmail.com
Fixes: 8c26c4e269 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Reported-by: syzbot+985ada84bf055a575c07@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=985ada84bf055a575c07
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 033bc52f35868c2493a2d95c56ece7fc155d7cb3)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:08 +05:30
Jinjie Ruan
ca75ebb0d2 posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
[ Upstream commit 6e62807c7fbb3c758d233018caf94dfea9c65dbd ]

If get_clock_desc() succeeds, it calls fget() for the clockid's fd,
and get the clk->rwsem read lock, so the error path should release
the lock to make the lock balance and fput the clockid's fd to make
the refcount balance and release the fd related resource.

However the below commit left the error path locked behind resulting in
unbalanced locking. Check timespec64_valid_strict() before
get_clock_desc() to fix it, because the "ts" is not changed
after that.

Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()")
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
[pabeni@redhat.com: fixed commit message typo]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d005400262ddaf1ca1666bbcd1acf42fe81d57ce)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:07 +05:30
Oliver Neukum
dce17a7cfb net: usb: usbnet: fix name regression
[ Upstream commit 8a7d12d674ac6f2147c18f36d1e15f1a48060edf ]

The fix for MAC addresses broke detection of the naming convention
because it gave network devices no random MAC before bind()
was called. This means that the check for the local assignment bit
was always negative as the address was zeroed from allocation,
instead of from overwriting the MAC with a unique hardware address.

The correct check for whether bind() has altered the MAC is
done with is_zero_ether_addr

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reported-by: Greg Thelen <gthelen@google.com>
Diagnosed-by: John Sperbeck <jsperbeck@google.com>
Fixes: bab8eb0dd4cb9 ("usbnet: modern method to get random MAC")
Link: https://patch.msgid.link/20241017071849.389636-1-oneukum@suse.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8f83f28d93d380fa4083f6a80fd7793f650e5278)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:44:03 +05:30
Wang Hai
d2068c660d be2net: fix potential memory leak in be_xmit()
[ Upstream commit e4dd8bfe0f6a23acd305f9b892c00899089bd621 ]

The be_xmit() returns NETDEV_TX_OK without freeing skb
in case of be_xmit_enqueue() fails, add dev_kfree_skb_any() to fix it.

Fixes: 760c295e0e ("be2net: Support for OS2BMC.")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Message-ID: <20241015144802.12150-1-wanghai38@huawei.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 941026023c256939943a47d1c66671526befbb26)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:42 +05:30
Wang Hai
f5631eb152 net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()
[ Upstream commit 2cb3f56e827abb22c4168ad0c1bbbf401bb2f3b8 ]

The sun3_82586_send_packet() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Message-ID: <20241015144148.7918-1-wanghai38@huawei.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 137010d26dc5cd47cd62fef77cbe952d31951b7a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:41 +05:30
Dave Kleikamp
27237be1cf jfs: Fix sanity check in dbMount
[ Upstream commit 67373ca8404fe57eb1bb4b57f314cff77ce54932 ]

MAXAG is a legitimate value for bmp->db_numag

Fixes: e63866a47556 ("jfs: fix out-of-bounds in dbNextAG() and diAlloc()")

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ea462ee11dbc4eb779146313d3abf5e5187775e1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:40 +05:30
Mark Rutland
bdc13a5966 arm64: probes: Fix uprobes for big-endian kernels
[ Upstream commit 13f8f1e05f1dc36dbba6cba0ae03354c0dafcde7 ]

The arm64 uprobes code is broken for big-endian kernels as it doesn't
convert the in-memory instruction encoding (which is always
little-endian) into the kernel's native endianness before analyzing and
simulating instructions. This may result in a few distinct problems:

* The kernel may may erroneously reject probing an instruction which can
  safely be probed.

* The kernel may erroneously erroneously permit stepping an
  instruction out-of-line when that instruction cannot be stepped
  out-of-line safely.

* The kernel may erroneously simulate instruction incorrectly dur to
  interpretting the byte-swapped encoding.

The endianness mismatch isn't caught by the compiler or sparse because:

* The arch_uprobe::{insn,ixol} fields are encoded as arrays of u8, so
  the compiler and sparse have no idea these contain a little-endian
  32-bit value. The core uprobes code populates these with a memcpy()
  which similarly does not handle endianness.

* While the uprobe_opcode_t type is an alias for __le32, both
  arch_uprobe_analyze_insn() and arch_uprobe_skip_sstep() cast from u8[]
  to the similarly-named probe_opcode_t, which is an alias for u32.
  Hence there is no endianness conversion warning.

Fix this by changing the arch_uprobe::{insn,ixol} fields to __le32 and
adding the appropriate __le32_to_cpu() conversions prior to consuming
the instruction encoding. The core uprobes copies these fields as opaque
ranges of bytes, and so is unaffected by this change.

At the same time, remove MAX_UINSN_BYTES and consistently use
AARCH64_INSN_SIZE for clarity.

Tested with the following:

| #include <stdio.h>
| #include <stdbool.h>
|
| #define noinline __attribute__((noinline))
|
| static noinline void *adrp_self(void)
| {
|         void *addr;
|
|         asm volatile(
|         "       adrp    %x0, adrp_self\n"
|         "       add     %x0, %x0, :lo12:adrp_self\n"
|         : "=r" (addr));
| }
|
|
| int main(int argc, char *argv)
| {
|         void *ptr = adrp_self();
|         bool equal = (ptr == adrp_self);
|
|         printf("adrp_self   => %p\n"
|                "adrp_self() => %p\n"
|                "%s\n",
|                adrp_self, ptr, equal ? "EQUAL" : "NOT EQUAL");
|
|         return 0;
| }

.... where the adrp_self() function was compiled to:

| 00000000004007e0 <adrp_self>:
|   4007e0:       90000000        adrp    x0, 400000 <__ehdr_start>
|   4007e4:       911f8000        add     x0, x0, #0x7e0
|   4007e8:       d65f03c0        ret

Before this patch, the ADRP is not recognized, and is assumed to be
steppable, resulting in corruption of the result:

| # ./adrp-self
| adrp_self   => 0x4007e0
| adrp_self() => 0x4007e0
| EQUAL
| # echo 'p /root/adrp-self:0x007e0' > /sys/kernel/tracing/uprobe_events
| # echo 1 > /sys/kernel/tracing/events/uprobes/enable
| # ./adrp-self
| adrp_self   => 0x4007e0
| adrp_self() => 0xffffffffff7e0
| NOT EQUAL

After this patch, the ADRP is correctly recognized and simulated:

| # ./adrp-self
| adrp_self   => 0x4007e0
| adrp_self() => 0x4007e0
| EQUAL
| #
| # echo 'p /root/adrp-self:0x007e0' > /sys/kernel/tracing/uprobe_events
| # echo 1 > /sys/kernel/tracing/events/uprobes/enable
| # ./adrp-self
| adrp_self   => 0x4007e0
| adrp_self() => 0x4007e0
| EQUAL

Fixes: 9842ceae9f ("arm64: Add uprobe support")
Cc: stable@vger.kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241008155851.801546-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b6a638cb600e13f94b5464724eaa6ab7f3349ca2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:40 +05:30
junhua huang
6a3cf06857 arm64:uprobe fix the uprobe SWBP_INSN in big-endian
[ Upstream commit 60f07e22a73d318cddaafa5ef41a10476807cc07 ]

We use uprobe in aarch64_be, which we found the tracee task would exit
due to SIGILL when we enable the uprobe trace.
We can see the replace inst from uprobe is not correct in aarch big-endian.
As in Armv8-A, instruction fetches are always treated as little-endian,
we should treat the UPROBE_SWBP_INSN as little-endian。

The test case is as following。
bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null &
bash-4.4# cd /sys/kernel/debug/tracing/
bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events
bash-4.4# echo 1 > events/uprobes/enable
bash-4.4#
bash-4.4# ps
  PID TTY          TIME CMD
  140 ?        00:00:01 bash
  237 ?        00:00:00 ps
[1]+  Illegal instruction     ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null

which we debug use gdb as following:

bash-4.4# gdb attach 155
(gdb) disassemble send
Dump of assembler code for function send:
   0x0000000000400c30 <+0>:     .inst   0xa00020d4 ; undefined
   0x0000000000400c34 <+4>:     mov     x29, sp
   0x0000000000400c38 <+8>:     str     w0, [sp, #28]
   0x0000000000400c3c <+12>:    strb    w1, [sp, #27]
   0x0000000000400c40 <+16>:    str     xzr, [sp, #40]
   0x0000000000400c44 <+20>:    str     xzr, [sp, #48]
   0x0000000000400c48 <+24>:    add     x0, sp, #0x1b
   0x0000000000400c4c <+28>:    mov     w3, #0x0                 // #0
   0x0000000000400c50 <+32>:    mov     x2, #0x1                 // #1
   0x0000000000400c54 <+36>:    mov     x1, x0
   0x0000000000400c58 <+40>:    ldr     w0, [sp, #28]
   0x0000000000400c5c <+44>:    bl      0x405e10 <mq_send>
   0x0000000000400c60 <+48>:    str     w0, [sp, #60]
   0x0000000000400c64 <+52>:    ldr     w0, [sp, #60]
   0x0000000000400c68 <+56>:    ldp     x29, x30, [sp], #64
   0x0000000000400c6c <+60>:    ret
End of assembler dump.
(gdb) info b
No breakpoints or watchpoints.
(gdb) c
Continuing.

Program received signal SIGILL, Illegal instruction.
0x0000000000400c30 in send ()
(gdb) x/10x 0x400c30
0x400c30 <send>:    0xd42000a0   0xfd030091      0xe01f00b9      0xe16f0039
0x400c40 <send+16>: 0xff1700f9   0xff1b00f9      0xe06f0091      0x03008052
0x400c50 <send+32>: 0x220080d2   0xe10300aa
(gdb) disassemble 0x400c30
Dump of assembler code for function send:
=> 0x0000000000400c30 <+0>:     .inst   0xa00020d4 ; undefined
   0x0000000000400c34 <+4>:     mov     x29, sp
   0x0000000000400c38 <+8>:     str     w0, [sp, #28]
   0x0000000000400c3c <+12>:    strb    w1, [sp, #27]
   0x0000000000400c40 <+16>:    str     xzr, [sp, #40]

Signed-off-by: junhua huang <huang.junhua@zte.com.cn>
Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
Stable-dep-of: 13f8f1e05f1d ("arm64: probes: Fix uprobes for big-endian kernels")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8fd414d25465bb666c71b5490fa939411e49228b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:39 +05:30
Ye Bin
bcd1432f2d Bluetooth: bnep: fix wild-memory-access in proto_unregister
[ Upstream commit 64a90991ba8d4e32e3173ddd83d0b24167a5668c ]

There's issue as follows:
  KASAN: maybe wild-memory-access in range [0xdead...108-0xdead...10f]
  CPU: 3 UID: 0 PID: 2805 Comm: rmmod Tainted: G        W
  RIP: 0010:proto_unregister+0xee/0x400
  Call Trace:
   <TASK>
   __do_sys_delete_module+0x318/0x580
   do_syscall_64+0xc1/0x1d0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

As bnep_init() ignore bnep_sock_init()'s return value, and bnep_sock_init()
will cleanup all resource. Then when remove bnep module will call
bnep_sock_cleanup() to cleanup sock's resource.
To solve above issue just return bnep_sock_init()'s return value in
bnep_exit().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e232728242c4e98fb30e4c6bedb6ba8b482b6301)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:38 +05:30
Wang Hai
1a7e5c75ee net: systemport: fix potential memory leak in bcm_sysport_xmit()
[ Upstream commit c401ed1c709948e57945485088413e1bb5e94bd1 ]

The bcm_sysport_xmit() returns NETDEV_TX_OK without freeing skb
in case of dma_map_single() fails, add dev_kfree_skb() to fix it.

Fixes: 80105befdb ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://patch.msgid.link/20241014145115.44977-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8e81ce7d0166a2249deb6d5e42f28a8b8c9ea72f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:37 +05:30
Wang Hai
24c460e74f net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()
[ Upstream commit cf57b5d7a2aad456719152ecd12007fe031628a3 ]

The greth_start_xmit_gbit() returns NETDEV_TX_OK without freeing skb
in case of skb->len being too long, add dev_kfree_skb() to fix it.

Fixes: d4c41139df ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241012110434.49265-1-wanghai38@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7517c13ae14dac758e4ec0d881e463a8315bbc7d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:36 +05:30
Kalesh AP
fc99da6ca0 RDMA/bnxt_re: Return more meaningful error
[ Upstream commit 98647df0178df215b8239c5c365537283b2852a6 ]

When the HWRM command fails, driver currently returns -EFAULT(Bad
address). This does not look correct.

Modified to return -EIO(I/O error).

Fixes: cc1ec769b8 ("RDMA/bnxt_re: Fixing the Control path command and response handling")
Fixes: 65288a22ddd8 ("RDMA/bnxt_re: use shadow qd while posting non blocking rcfw command")
Link: https://patch.msgid.link/r/1728373302-19530-5-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8fb8f613a904d3ccf61fa824a95f2fa2c3b8f191)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:36 +05:30
Anumula Murali Mohan Reddy
65d4dff66e RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
[ Upstream commit c659b405b82ead335bee6eb33f9691bf718e21e8 ]

ip_dev_find() always returns real net_device address, whether traffic is
running on a vlan or real device, if traffic is over vlan, filling
endpoint struture with real ndev and an attempt to send a connect request
will results in RDMA_CM_EVENT_UNREACHABLE error.  This patch fixes the
issue by using vlan_dev_real_dev().

Fixes: 830662f6f0 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address")
Link: https://patch.msgid.link/r/20241007132311.70593-1-anumula@chelsio.com
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 361576c9d34bd16b089864545073db383e372ba8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:35 +05:30
Saravanan Vajravel
838faa9556 RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
[ Upstream commit 9ab20f76ae9fad55ebaf36bdff04aea1c2552374 ]

Driver uses internal data structure to construct WQE frame.
It used avid type as u16 which can accommodate up to 64K AVs.
When outstanding AVID crosses 64K, driver truncates AVID and
hence it uses incorrect AVID to WR. This leads to WR failure
due to invalid AV ID and QP is moved to error state with reason
set to 19 (INVALID AVID). When RDMA CM path is used, this issue
hits QP1 and it is moved to error state

Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Link: https://patch.msgid.link/r/1726715161-18941-3-git-send-email-selvin.xavier@broadcom.com
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3e98839514a883188710c5467cf3b62a36c7885a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:34 +05:30
Ryusuke Konishi
b6de19cd30 nilfs2: propagate directory read errors from nilfs_find_entry()
commit 08cfa12adf888db98879dbd735bc741360a34168 upstream.

Syzbot reported that a task hang occurs in vcs_open() during a fuzzing
test for nilfs2.

The root cause of this problem is that in nilfs_find_entry(), which
searches for directory entries, ignores errors when loading a directory
page/folio via nilfs_get_folio() fails.

If the filesystem images is corrupted, and the i_size of the directory
inode is large, and the directory page/folio is successfully read but
fails the sanity check, for example when it is zero-filled,
nilfs_check_folio() may continue to spit out error messages in bursts.

Fix this issue by propagating the error to the callers when loading a
page/folio fails in nilfs_find_entry().

The current interface of nilfs_find_entry() and its callers is outdated
and cannot propagate error codes such as -EIO and -ENOMEM returned via
nilfs_find_entry(), so fix it together.

Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74e ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Lizhi Xu <lizhi.xu@windriver.com>
Closes: https://lkml.kernel.org/r/20240927013806.3577931-1-lizhi.xu@windriver.com
Reported-by: syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8a192e8d090fa9a31135
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bb857ae1efd3138c653239ed1e7aef14e1242c81)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:43:31 +05:30
Zhang Rui
e1a1b8cddf x86/apic: Always explicitly disarm TSC-deadline timer
commit ffd95846c6ec6cf1f93da411ea10d504036cab42 upstream.

New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.

The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.

Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.

Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.

1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revision-guides/41322_10h_Rev_Gd.pdf

Fixes: 279f146143 ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e75562346cac53c7e933373a004b1829e861123a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:42:49 +05:30
Takashi Iwai
29cf6e5286 parport: Proper fix for array out-of-bounds access
commit 02ac3a9ef3a18b58d8f3ea2b6e46de657bf6c4f9 upstream.

The recent fix for array out-of-bounds accesses replaced sprintf()
calls blindly with snprintf().  However, since snprintf() returns the
would-be-printed size, not the actually output size, the length
calculation can still go over the given limit.

Use scnprintf() instead of snprintf(), which returns the actually
output letters, for addressing the potential out-of-bounds access
properly.

Fixes: ab11dac93d2d ("dev/parport: fix the array out-of-bounds risk")
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20240920103318.19271-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8aadef73ba3b325704ed5cfc4696a25c350182cf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:42:45 +05:30
Daniele Palmas
df950096cb USB: serial: option: add Telit FN920C04 MBIM compositions
commit 6d951576ee16430822a8dee1e5c54d160e1de87d upstream.

Add the following Telit FN920C04 compositions:

0x10a2: MBIM + tty (AT/NMEA) + tty (AT) + tty (diag)
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 17 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10a2 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x10a7: MBIM + tty (AT) + tty (AT) + tty (diag)
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 18 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10a7 Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x10aa: MBIM + tty (AT) + tty (diag) + DPL (data packet logging) + adb
T:  Bus=03 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 15 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=10aa Rev=05.15
S:  Manufacturer=Telit Cinterion
S:  Product=FN920
S:  SerialNumber=92c4c4d8
C:  #Ifs= 6 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 4 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 20cc2b146a8748902a5e4f5aa70457f48174b5c4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:42:15 +05:30
Benjamin B. Frost
3a18460c70 USB: serial: option: add support for Quectel EG916Q-GL
commit 540eff5d7faf0c9330ec762da49df453263f7676 upstream.

Add Quectel EM916Q-GL with product ID 0x6007

T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=01 Dev#=  3 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=2c7c ProdID=6007 Rev= 2.00
S:  Manufacturer=Quectel
S:  Product=EG916Q-GL
C:* #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=200mA
A:  FirstIf#= 4 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=82(I) Atr=03(Int.) MxPS=  16 Ivl=32ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=84(I) Atr=03(Int.) MxPS=  16 Ivl=32ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
E:  Ad=86(I) Atr=03(Int.) MxPS=  16 Ivl=32ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether
E:  Ad=88(I) Atr=03(Int.) MxPS=  32 Ivl=32ms
I:  If#= 5 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether
I:* If#= 5 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

MI_00 Quectel USB Diag Port
MI_01 Quectel USB NMEA Port
MI_02 Quectel USB AT Port
MI_03 Quectel USB Modem Port
MI_04 Quectel USB Net Port

Signed-off-by: Benjamin B. Frost <benjamin@geanix.com>
Reviewed-by: Lars Melin <larsm17@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cdb2c8b31ea3ba692c9ab213369b095e794c8f39)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:41:19 +05:30
Mathias Nyman
8538ef4df4 xhci: Fix incorrect stream context type macro
commit 6599b6a6fa8060145046d0744456b6abdb3122a7 upstream.

The stream contex type (SCT) bitfield is used both in the stream context
data structure,  and in the 'Set TR Dequeue pointer' command TRB.
In both cases it uses bits 3:1

The SCT_FOR_TRB(p) macro used to set the stream context type (SCT) field
for the 'Set TR Dequeue pointer' command TRB incorrectly shifts the value
1 bit left before masking the three bits.

Fix this by first masking and rshifting, just like the similar
SCT_FOR_CTX(p) macro does

This issue has not been visibile as the lost bit 3 is only used with
secondary stream arrays (SSA). Xhci driver currently only supports using
a primary stream array with Linear stream addressing.

Fixes: 95241dbdf8 ("xhci: Set SCT field for Set TR dequeue on streams")
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241016140000.783905-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e76b961d32fd94c7af80bc0ea35e345f1f838c59)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:56 +05:30
Luiz Augusto von Dentz
675badf9e1 Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
commit 2c1dda2acc4192d826e84008d963b528e24d12bc upstream.

Fake CSR controllers don't seem to handle short-transfer properly which
cause command to time out:

kernel: usb 1-1: new full-speed USB device number 19 using xhci_hcd
kernel: usb 1-1: New USB device found, idVendor=0a12, idProduct=0001, bcdDevice=88.91
kernel: usb 1-1: New USB device strings: Mfr=0, Product=2, SerialNumber=0
kernel: usb 1-1: Product: BT DONGLE10
...
Bluetooth: hci1: Opcode 0x1004 failed: -110
kernel: Bluetooth: hci1: command 0x1004 tx timeout

According to USB Spec 2.0 Section 5.7.3 Interrupt Transfer Packet Size
Constraints a interrupt transfer is considered complete when the size is 0
(ZPL) or < wMaxPacketSize:

 'When an interrupt transfer involves more data than can fit in one
 data payload of the currently established maximum size, all data
 payloads are required to be maximum-sized except for the last data
 payload, which will contain the remaining data. An interrupt transfer
 is complete when the endpoint does one of the following:

 • Has transferred exactly the amount of data expected
 • Transfers a packet with a payload size less than wMaxPacketSize or
 transfers a zero-length packet'

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219365
Fixes: 7b05933340f4 ("Bluetooth: btusb: Fix not handling ZPL/short-transfer")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e32ae4a12628bb2c1046715f47ea7d57fc2b9cbf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:54 +05:30
Emil Gedenryd
dd8049e770 iio: light: opt3001: add missing full-scale range value
commit 530688e39c644543b71bdd9cb45fdfb458a28eaa upstream.

The opt3001 driver uses predetermined full-scale range values to
determine what exponent to use for event trigger threshold values.
The problem is that one of the values specified in the datasheet is
missing from the implementation. This causes larger values to be
scaled down to an incorrect exponent, effectively reducing the
maximum settable threshold value by a factor of 2.

Add missing full-scale range array value.

Fixes: 94a9b7b180 ("iio: light: add support for TI's opt3001 light sensor")
Signed-off-by: Emil Gedenryd <emil.gedenryd@axis.com>
Cc: <Stable@vger.kernel.org>
Link: https://patch.msgid.link/20240913-add_opt3002-v2-1-69e04f840360@axis.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4401780146a19d65df6f49d5273855f33c9c0a35)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:53 +05:30
Christophe JAILLET
bf9a0b9df6 iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()
commit 3a29b84cf7fbf912a6ab1b9c886746f02b74ea25 upstream.

If hid_sensor_set_report_latency() fails, the error code should be returned
instead of a value likely to be interpreted as 'success'.

Fixes: 138bc7969c ("iio: hid-sensor-hub: Implement batch mode")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://patch.msgid.link/c50640665f091a04086e5092cf50f73f2055107a.1727980825.git.christophe.jaillet@wanadoo.fr
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 485744b5bd1f15a3ce50f70af52a9d68761c57dd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:52 +05:30
Javier Carrasco
46e8f48c88 iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig
commit 27b6aa68a68105086aef9f0cb541cd688e5edea8 upstream.

This driver makes use of regmap_mmio, but does not select the required
module.
Add the missing 'select REGMAP_MMIO'.

Fixes: 4d4b30526e ("iio: dac: add support for stm32 DAC")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://patch.msgid.link/20241003-ad2s1210-select-v1-8-4019453f8c33@gmail.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 842911035eb20561218a0742f3e54e7978799c6a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:52 +05:30
Nikolay Kuratov
2415a47bf2 drm/vmwgfx: Handle surface check failure correctly
commit 26498b8d54373d31a621d7dec95c4bd842563b3b upstream.

Currently if condition (!bo and !vmw_kms_srf_ok()) was met
we go to err_out with ret == 0.
err_out dereferences vfb if ret == 0, but in our case vfb is still NULL.

Fix this by assigning sensible error to ret.

Found by Linux Verification Center (linuxtesting.org) with SVACE

Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
Cc: stable@vger.kernel.org
Fixes: 810b3e1683 ("drm/vmwgfx: Support topology greater than texture size")
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241002122429.1981822-1-kniv@yandex-team.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f924af529417292c74c043c627289f56ad95a002)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:51 +05:30
Jim Mattson
b64c468b7d x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
commit ff898623af2ed564300752bba83a680a1e4fec8d upstream.

AMD's initial implementation of IBPB did not clear the return address
predictor. Beginning with Zen4, AMD's IBPB *does* clear the return address
predictor. This behavior is enumerated by CPUID.80000008H:EBX.IBPB_RET[30].

Define X86_FEATURE_AMD_IBPB_RET for use in KVM_GET_SUPPORTED_CPUID,
when determining cross-vendor capabilities.

Suggested-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9e460c6c7c8b72c4c23853627789c812fd2c3cf5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:46 +05:30
Michael Mueller
9fe3d52739 KVM: s390: Change virtual to physical address access in diag 0x258 handler
commit cad4b3d4ab1f062708fff33f44d246853f51e966 upstream.

The parameters for the diag 0x258 are real addresses, not virtual, but
KVM was using them as virtual addresses. This only happened to work, since
the Linux kernel as a guest used to have a 1:1 mapping for physical vs
virtual addresses.

Fix KVM so that it correctly uses the addresses as real addresses.

Cc: stable@vger.kernel.org
Fixes: 8ae04b8f50 ("KVM: s390: Guest's memory access functions get access registers")
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240917151904.74314-3-nrb@linux.ibm.com
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a9dee098c6931dfd75abe015b04c1c66fa1507f6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:12 +05:30
Thomas Weißschuh
fcedc5de59 s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
commit dee3df68ab4b00fff6bdf9fc39541729af37307c upstream.

According to the VT220 specification the possible character combinations
sent on RETURN are only CR or CRLF [0].

	The Return key sends either a CR character (0/13) or a CR
	character (0/13) and an LF character (0/10), depending on the
	set/reset state of line feed/new line mode (LNM).

The sclp/vt220 driver however uses LFCR. This can confuse tools, for
example the kunit runner.

Link: https://vt100.net/docs/vt220-rm/chapter3.html#S3.2
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20241014-s390-kunit-v1-2-941defa765a6@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ce6924fdafb09a7231ecfcea119b4e4c83023c97)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:11 +05:30
Breno Leitao
b8cbe57566 KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
commit 49f683b41f28918df3e51ddc0d928cb2e934ccdb upstream.

Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the
loads and stores are atomic.  In the extremely unlikely scenario the
compiler tears the stores, it's theoretically possible for KVM to attempt
to get a vCPU using an out-of-bounds index, e.g. if the write is split
into multiple 8-bit stores, and is paired with a 32-bit load on a VM with
257 vCPUs:

  CPU0                              CPU1
  last_boosted_vcpu = 0xff;

                                    (last_boosted_vcpu = 0x100)
                                    last_boosted_vcpu[15:8] = 0x01;
  i = (last_boosted_vcpu = 0x1ff)
                                    last_boosted_vcpu[7:0] = 0x00;

  vcpu = kvm->vcpu_array[0x1ff];

As detected by KCSAN:

  BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]

  write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
  kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm
  handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
  vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
		 arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
  vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
  kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
  kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
  __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
  __x64_sys_ioctl (fs/ioctl.c:890)
  x64_sys_call (arch/x86/entry/syscall_64.c:33)
  do_syscall_64 (arch/x86/entry/common.c:?)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

  read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
  kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm
  handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
  vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
			arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
  vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
  kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
  kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
  __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
  __x64_sys_ioctl (fs/ioctl.c:890)
  x64_sys_call (arch/x86/entry/syscall_64.c:33)
  do_syscall_64 (arch/x86/entry/common.c:?)
  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

  value changed: 0x00000012 -> 0x00000000

Fixes: 217ece6129 ("KVM: use yield_to instead of sleep in kvm_vcpu_on_spin")
Cc: stable@vger.kernel.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240510092353.2261824-1-leitao@debian.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 11a772d5376aa6d3e2e69b5b5c585f79b60c0e17)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:09 +05:30
OGAWA Hirofumi
be1ff0420a fat: fix uninitialized variable
commit 963a7f4d3b90ee195b895ca06b95757fcba02d1a upstream.

syszbot produced this with a corrupted fs image.  In theory, however an IO
error would trigger this also.

This affects just an error report, so should not be a serious error.

Link: https://lkml.kernel.org/r/87r08wjsnh.fsf@mail.parknet.co.jp
Link: https://lkml.kernel.org/r/66ff2c95.050a0220.49194.03e9.GAE@google.com
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: syzbot+ef0d7bc412553291aa86@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 09b2d2a2267187336b446f4c08e6204c30688bcf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:09 +05:30
WangYuli
dcb35b343d PCI: Add function 0 DMA alias quirk for Glenfly Arise chip
commit 9246b487ab3c3b5993aae7552b7a4c541cc14a49 upstream.

Add DMA support for audio function of Glenfly Arise chip, which uses
Requester ID of function 0.

Link: https://lore.kernel.org/r/CA2BBD087345B6D1+20240823095708.3237375-1-wangyuli@uniontech.com
Signed-off-by: SiyuLi <siyuli@glenfly.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
[bhelgaas: lower-case hex to match local code, drop unused Device IDs]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 029efe3b57d981b0c239e50f3513838cae121578)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:08 +05:30
Mark Rutland
3e02c3cad4 arm64: probes: Fix simulate_ldr*_literal()
commit 50f813e57601c22b6f26ced3193b9b94d70a2640 upstream.

The simulate_ldr_literal() code always loads a 64-bit quantity, and when
simulating a 32-bit load into a 'W' register, it discards the most
significant 32 bits. For big-endian kernels this means that the relevant
bits are discarded, and the value returned is the the subsequent 32 bits
in memory (i.e. the value at addr + 4).

Additionally, simulate_ldr_literal() and simulate_ldrsw_literal() use a
plain C load, which the compiler may tear or elide (e.g. if the target
is the zero register). Today this doesn't happen to matter, but it may
matter in future if trampoline code uses a LDR (literal) or LDRSW
(literal).

Update simulate_ldr_literal() and simulate_ldrsw_literal() to use an
appropriately-sized READ_ONCE() to perform the access, which avoids
these problems.

Fixes: 39a67d49ba ("arm64: kprobes instruction simulation support")
Cc: stable@vger.kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241008155851.801546-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 19f4d3a94c77295ee3a7bbac91e466955f458671)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:07 +05:30
Mark Rutland
747159a8ea arm64: probes: Remove broken LDR (literal) uprobe support
commit acc450aa07099d071b18174c22a1119c57da8227 upstream.

The simulate_ldr_literal() and simulate_ldrsw_literal() functions are
unsafe to use for uprobes. Both functions were originally written for
use with kprobes, and access memory with plain C accesses. When uprobes
was added, these were reused unmodified even though they cannot safely
access user memory.

There are three key problems:

1) The plain C accesses do not have corresponding extable entries, and
   thus if they encounter a fault the kernel will treat these as
   unintentional accesses to user memory, resulting in a BUG() which
   will kill the kernel thread, and likely lead to further issues (e.g.
   lockup or panic()).

2) The plain C accesses are subject to HW PAN and SW PAN, and so when
   either is in use, any attempt to simulate an access to user memory
   will fault. Thus neither simulate_ldr_literal() nor
   simulate_ldrsw_literal() can do anything useful when simulating a
   user instruction on any system with HW PAN or SW PAN.

3) The plain C accesses are privileged, as they run in kernel context,
   and in practice can access a small range of kernel virtual addresses.
   The instructions they simulate have a range of +/-1MiB, and since the
   simulated instructions must itself be a user instructions in the
   TTBR0 address range, these can address the final 1MiB of the TTBR1
   acddress range by wrapping downwards from an address in the first
   1MiB of the TTBR0 address range.

   In contemporary kernels the last 8MiB of TTBR1 address range is
   reserved, and accesses to this will always fault, meaning this is no
   worse than (1).

   Historically, it was theoretically possible for the linear map or
   vmemmap to spill into the final 8MiB of the TTBR1 address range, but
   in practice this is extremely unlikely to occur as this would
   require either:

   * Having enough physical memory to fill the entire linear map all the
     way to the final 1MiB of the TTBR1 address range.

   * Getting unlucky with KASLR randomization of the linear map such
     that the populated region happens to overlap with the last 1MiB of
     the TTBR address range.

   ... and in either case if we were to spill into the final page there
   would be larger problems as the final page would alias with error
   pointers.

Practically speaking, (1) and (2) are the big issues. Given there have
been no reports of problems since the broken code was introduced, it
appears that no-one is relying on probing these instructions with
uprobes.

Avoid these issues by not allowing uprobes on LDR (literal) and LDRSW
(literal), limiting the use of simulate_ldr_literal() and
simulate_ldrsw_literal() to kprobes. Attempts to place uprobes on LDR
(literal) and LDRSW (literal) will be rejected as
arm_probe_decode_insn() will return INSN_REJECTED. In future we can
consider introducing working uprobes support for these instructions, but
this will require more significant work.

Fixes: 9842ceae9f ("arm64: Add uprobe support")
Cc: stable@vger.kernel.org
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20241008155851.801546-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cc86f2e9876c8b5300238cec6bf0bd8c842078ee)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:05 +05:30
Jinjie Ruan
79b9b71c90 posix-clock: Fix missing timespec64 check in pc_clock_settime()
commit d8794ac20a299b647ba9958f6d657051fc51a540 upstream.

As Andrew pointed out, it will make sense that the PTP core
checked timespec64 struct's tv_sec and tv_nsec range before calling
ptp->info->settime64().

As the man manual of clock_settime() said, if tp.tv_sec is negative or
tp.tv_nsec is outside the range [0..999,999,999], it should return EINVAL,
which include dynamic clocks which handles PTP clock, and the condition is
consistent with timespec64_valid(). As Thomas suggested, timespec64_valid()
only check the timespec is valid, but not ensure that the time is
in a valid range, so check it ahead using timespec64_valid_strict()
in pc_clock_settime() and return -EINVAL if not valid.

There are some drivers that use tp->tv_sec and tp->tv_nsec directly to
write registers without validity checks and assume that the higher layer
has checked it, which is dangerous and will benefit from this, such as
hclge_ptp_settime(), igb_ptp_settime_i210(), _rcar_gen4_ptp_settime(),
and some drivers can remove the checks of itself.

Cc: stable@vger.kernel.org
Fixes: 0606f422b4 ("posix clocks: Introduce dynamic clocks")
Acked-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20241009072302.1754567-2-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 29f085345cde24566efb751f39e5d367c381c584)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:05 +05:30
Anastasia Kovaleva
36d36def55 net: Fix an unsafe loop on the list
commit 1dae9f1187189bc09ff6d25ca97ead711f7e26f9 upstream.

The kernel may crash when deleting a genetlink family if there are still
listeners for that family:

Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  NIP [c000000000c080bc] netlink_update_socket_mc+0x3c/0xc0
  LR [c000000000c0f764] __netlink_clear_multicast_users+0x74/0xc0
  Call Trace:
__netlink_clear_multicast_users+0x74/0xc0
genl_unregister_family+0xd4/0x2d0

Change the unsafe loop on the list to a safe one, because inside the
loop there is an element removal from this list.

Fixes: b8273570f8 ("genetlink: fix netns vs. netlink table locking (2)")
Cc: stable@vger.kernel.org
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241003104431.12391-1-a.kovaleva@yadro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 464801a0f6ccb52b21faa33bac6014fd74cc5e10)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:04 +05:30
Icenowy Zheng
c0d9f6bc66 usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
commit a6555cb1cb69db479d0760e392c175ba32426842 upstream.

JieLi tends to use SCSI via USB Mass Storage to implement their own
proprietary commands instead of implementing another USB interface.
Enumerating it as a generic mass storage device will lead to a Hardware
Error sense key get reported.

Ignore this bogus device to prevent appearing a unusable sdX device
file.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Cc: stable <stable@kernel.org>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20241001083407.8336-1-uwu@icenowy.me
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7a8df891d679d6627d91e334a734578ca16518eb)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:40:02 +05:30
Jose Alberto Reguero
e090c7fa05 usb: xhci: Fix problem with xhci resume from suspend
commit d44238d8254a36249d576c96473269dbe500f5e4 upstream.

I have a ASUS PN51 S mini pc that has two xhci devices. One from AMD,
and other from ASMEDIA. The one from ASMEDIA have problems when resume
from suspend, and keep broken until unplug the  power cord. I use this
kernel parameter: xhci-hcd.quirks=128 and then it works ok. I make a
path to reset only the ASMEDIA xhci.

Signed-off-by: Jose Alberto Reguero <jose.alberto.reguero@gmail.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20240919184202.22249-1-jose.alberto.reguero@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 52e998173cfed7d6953b3185f2da174712ce4a8f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:39:58 +05:30
Oliver Neukum
74ae75c272 Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
commit 71c717cd8a2e180126932cc6851ff21c1d04d69a upstream.

This reverts commit 86b20af11e84c26ae3fde4dcc4f490948e3f8035.

This patch leads to passing 0 to simple_read_from_buffer()
as a fifth argument, turning the read method into a nop.
The change is fundamentally flawed, as it breaks the driver.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Cc: stable <stable@kernel.org>
Link: https://lore.kernel.org/r/20241007094004.242122-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6f8f23390160355a4a571230986d524fd3929c2a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:39:35 +05:30
Wade Wang
578d2b58f9 HID: plantronics: Workaround for an unexcepted opposite volume key
commit 87b696209007b7c4ef7bdfe39ea0253404a43770 upstream.

Some Plantronics headset as the below send an unexcept opposite
volume key's HID report for each volume key press after 200ms, like
unecepted Volume Up Key following Volume Down key pressed by user.
This patch adds a quirk to hid-plantronics for these devices, which
will ignore the second unexcepted opposite volume key if it happens
within 220ms from the last one that was handled.
    Plantronics EncorePro 500 Series  (047f:431e)
    Plantronics Blackwire_3325 Series (047f:430c)

The patch was tested on the mentioned model, it shouldn't affect
other models, however, this quirk might be needed for them too.
Auto-repeat (when a key is held pressed) is not affected per test
result.

Cc: stable@vger.kernel.org
Signed-off-by: Wade Wang <wade.wang@hp.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b1ce11ce52359eefa7bc33be13e946a7154fd35f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:39:32 +05:30
Oliver Neukum
ccb319092c CDC-NCM: avoid overflow in sanity checking
commit 8d2b1a1ec9f559d30b724877da4ce592edc41fdc upstream.

A broken device may give an extreme offset like 0xFFF0
and a reasonable length for a fragment. In the sanity
check as formulated now, this will create an integer
overflow, defeating the sanity check. Both offset
and offset + len need to be checked in such a manner
that no overflow can occur.
And those quantities should be unsigned.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Bruno VERNAY <bruno.vernay@se.com>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a612395c7631918e0e10ea48b9ce5ab4340f26a6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:38:45 +05:30
Eric Dumazet
0568c1aa3f ppp: fix ppp_async_encode() illegal access
[ Upstream commit 40dddd4b8bd08a69471efd96107a4e1c73fabefc ]

syzbot reported an issue in ppp_async_encode() [1]

In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.

BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
 BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
  ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
  ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
  ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
  ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
  pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
  sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
  __release_sock+0x1da/0x330 net/core/sock.c:3072
  release_sock+0x6b/0x250 net/core/sock.c:3626
  pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:4092 [inline]
  slab_alloc_node mm/slub.c:4135 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
  alloc_skb include/linux/skbuff.h:1322 [inline]
  sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
  pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
  sock_sendmsg_nosec net/socket.c:729 [inline]
  __sock_sendmsg+0x30f/0x380 net/socket.c:744
  ____sys_sendmsg+0x903/0xb60 net/socket.c:2602
  ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
  __sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
  __do_sys_sendmmsg net/socket.c:2771 [inline]
  __se_sys_sendmmsg net/socket.c:2768 [inline]
  __x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
  x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted 6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4151ec65abd755133ebec687218fadd2d2631167)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:51 +05:30
Rosen Penev
e9a0a69857 net: ibm: emac: mal: fix wrong goto
[ Upstream commit 08c8acc9d8f3f70d62dd928571368d5018206490 ]

dcr_map is called in the previous if and therefore needs to be unmapped.

Fixes: 1ff0fcfcb1 ("ibm_newemac: Fix new MAL feature handling")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241007235711.5714-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4bd7823cacb21e32f3750828148ed5d18d3bf007)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:50 +05:30
Mohamed Khalfella
0db3e6548a igb: Do not bring the device up after non-fatal error
[ Upstream commit 330a699ecbfc9c26ec92c6310686da1230b4e7eb ]

Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
changed igb_io_error_detected() to ignore non-fatal pcie errors in order
to avoid hung task that can happen when igb_down() is called multiple
times. This caused an issue when processing transient non-fatal errors.
igb_io_resume(), which is called after igb_io_error_detected(), assumes
that device is brought down by igb_io_error_detected() if the interface
is up. This resulted in panic with stacktrace below.

[ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
[  T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[  T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[  T292] igb 0000:09:00.0:   device [8086:1537] error status/mask=00004000/00000000
[  T292] igb 0000:09:00.0:    [14] CmpltTO [  200.105524,009][  T292] igb 0000:09:00.0: AER:   TLP Header: 00000000 00000000 00000000 00000000
[  T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
[  T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[  T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
[  T292] pcieport 0000:00:1c.5: AER: broadcast resume message
[  T292] ------------[ cut here ]------------
[  T292] kernel BUG at net/core/dev.c:6539!
[  T292] invalid opcode: 0000 [#1] PREEMPT SMP
[  T292] RIP: 0010:napi_enable+0x37/0x40
[  T292] Call Trace:
[  T292]  <TASK>
[  T292]  ? die+0x33/0x90
[  T292]  ? do_trap+0xdc/0x110
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? do_error_trap+0x70/0xb0
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? exc_invalid_op+0x4e/0x70
[  T292]  ? napi_enable+0x37/0x40
[  T292]  ? asm_exc_invalid_op+0x16/0x20
[  T292]  ? napi_enable+0x37/0x40
[  T292]  igb_up+0x41/0x150
[  T292]  igb_io_resume+0x25/0x70
[  T292]  report_resume+0x54/0x70
[  T292]  ? report_frozen_detected+0x20/0x20
[  T292]  pci_walk_bus+0x6c/0x90
[  T292]  ? aer_print_port_info+0xa0/0xa0
[  T292]  pcie_do_recovery+0x22f/0x380
[  T292]  aer_process_err_devices+0x110/0x160
[  T292]  aer_isr+0x1c1/0x1e0
[  T292]  ? disable_irq_nosync+0x10/0x10
[  T292]  irq_thread_fn+0x1a/0x60
[  T292]  irq_thread+0xe3/0x1a0
[  T292]  ? irq_set_affinity_notifier+0x120/0x120
[  T292]  ? irq_affinity_notify+0x100/0x100
[  T292]  kthread+0xe2/0x110
[  T292]  ? kthread_complete_and_exit+0x20/0x20
[  T292]  ret_from_fork+0x2d/0x50
[  T292]  ? kthread_complete_and_exit+0x20/0x20
[  T292]  ret_from_fork_asm+0x11/0x20
[  T292]  </TASK>

To fix this issue igb_io_resume() checks if the interface is running and
the device is not down this means igb_io_error_detected() did not bring
the device down and there is no need to bring it up.

Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit dca2ca65a8695d9593e2cf1b40848e073ad75413)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:49 +05:30
Billy Tsai
930f2da139 gpio: aspeed: Add the flush write to ensure the write complete.
[ Upstream commit 1bb5a99e1f3fd27accb804aa0443a789161f843c ]

Performing a dummy read ensures that the register write operation is fully
completed, mitigating any potential bus delays that could otherwise impact
the frequency of bitbang usage. E.g., if the JTAG application uses GPIO to
control the JTAG pins (TCK, TMS, TDI, TDO, and TRST), and the application
sets the TCK clock to 1 MHz, the GPIO's high/low transitions will rely on
a delay function to ensure the clock frequency does not exceed 1 MHz.
However, this can lead to rapid toggling of the GPIO because the write
operation is POSTed and does not wait for a bus acknowledgment.

Fixes: 361b79119a ("gpio: Add Aspeed driver")
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com>
Link: https://lore.kernel.org/r/20241008081450.1490955-2-billy_tsai@aspeedtech.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8c4d52b80f2d9dcc5053226ddd18a3bb1177c8ed)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:48 +05:30
Luiz Augusto von Dentz
8687978c88 Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change
[ Upstream commit 08d1914293dae38350b8088980e59fbc699a72fe ]

rfcomm_sk_state_change attempts to use sock_lock so it must never be
called with it locked but rfcomm_sock_ioctl always attempt to lock it
causing the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
------------------------------------------------------
syz-executor386/5093 is trying to acquire lock:
ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1671 [inline]
ffff88807c396258 (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+.}-{0:0}, at: rfcomm_sk_state_change+0x5b/0x310 net/bluetooth/rfcomm/sock.c:73

but task is already holding lock:
ffff88807badfd28 (&d->lock){+.+.}-{3:3}, at: __rfcomm_dlc_close+0x226/0x6a0 net/bluetooth/rfcomm/core.c:491

Reported-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
Tested-by: syzbot+d7ce59b06b3eb14fd218@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d7ce59b06b3eb14fd218
Fixes: 3241ad820d ("[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b77b3fb12fd483cae7c28648903b1d8a6b275f01)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:47 +05:30
Andy Roulin
228b0d7409 netfilter: br_netfilter: fix panic with metadata_dst skb
[ Upstream commit f9ff7665cd128012868098bbd07e28993e314fdb ]

Fix a kernel panic in the br_netfilter module when sending untagged
traffic via a VxLAN device.
This happens during the check for fragmentation in br_nf_dev_queue_xmit.

It is dependent on:
1) the br_netfilter module being loaded;
2) net.bridge.bridge-nf-call-iptables set to 1;
3) a bridge with a VxLAN (single-vxlan-device) netdevice as a bridge port;
4) untagged frames with size higher than the VxLAN MTU forwarded/flooded

When forwarding the untagged packet to the VxLAN bridge port, before
the netfilter hooks are called, br_handle_egress_vlan_tunnel is called and
changes the skb_dst to the tunnel dst. The tunnel_dst is a metadata type
of dst, i.e., skb_valid_dst(skb) is false, and metadata->dst.dev is NULL.

Then in the br_netfilter hooks, in br_nf_dev_queue_xmit, there's a check
for frames that needs to be fragmented: frames with higher MTU than the
VxLAN device end up calling br_nf_ip_fragment, which in turns call
ip_skb_dst_mtu.

The ip_dst_mtu tries to use the skb_dst(skb) as if it was a valid dst
with valid dst->dev, thus the crash.

This case was never supported in the first place, so drop the packet
instead.

PING 10.0.0.2 (10.0.0.2) from 0.0.0.0 h1-eth0: 2000(2028) bytes of data.
[  176.291791] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000110
[  176.292101] Mem abort info:
[  176.292184]   ESR = 0x0000000096000004
[  176.292322]   EC = 0x25: DABT (current EL), IL = 32 bits
[  176.292530]   SET = 0, FnV = 0
[  176.292709]   EA = 0, S1PTW = 0
[  176.292862]   FSC = 0x04: level 0 translation fault
[  176.293013] Data abort info:
[  176.293104]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  176.293488]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  176.293787]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  176.293995] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043ef5000
[  176.294166] [0000000000000110] pgd=0000000000000000,
p4d=0000000000000000
[  176.294827] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  176.295252] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel veth
br_netfilter bridge stp llc ipv6 crct10dif_ce
[  176.295923] CPU: 0 PID: 188 Comm: ping Not tainted
6.8.0-rc3-g5b3fbd61b9d1 #2
[  176.296314] Hardware name: linux,dummy-virt (DT)
[  176.296535] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  176.296808] pc : br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.297382] lr : br_nf_dev_queue_xmit+0x2ac/0x4ec [br_netfilter]
[  176.297636] sp : ffff800080003630
[  176.297743] x29: ffff800080003630 x28: 0000000000000008 x27:
ffff6828c49ad9f8
[  176.298093] x26: ffff6828c49ad000 x25: 0000000000000000 x24:
00000000000003e8
[  176.298430] x23: 0000000000000000 x22: ffff6828c4960b40 x21:
ffff6828c3b16d28
[  176.298652] x20: ffff6828c3167048 x19: ffff6828c3b16d00 x18:
0000000000000014
[  176.298926] x17: ffffb0476322f000 x16: ffffb7e164023730 x15:
0000000095744632
[  176.299296] x14: ffff6828c3f1c880 x13: 0000000000000002 x12:
ffffb7e137926a70
[  176.299574] x11: 0000000000000001 x10: ffff6828c3f1c898 x9 :
0000000000000000
[  176.300049] x8 : ffff6828c49bf070 x7 : 0008460f18d5f20e x6 :
f20e0100bebafeca
[  176.300302] x5 : ffff6828c7f918fe x4 : ffff6828c49bf070 x3 :
0000000000000000
[  176.300586] x2 : 0000000000000000 x1 : ffff6828c3c7ad00 x0 :
ffff6828c7f918f0
[  176.300889] Call trace:
[  176.301123]  br_nf_dev_queue_xmit+0x390/0x4ec [br_netfilter]
[  176.301411]  br_nf_post_routing+0x2a8/0x3e4 [br_netfilter]
[  176.301703]  nf_hook_slow+0x48/0x124
[  176.302060]  br_forward_finish+0xc8/0xe8 [bridge]
[  176.302371]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.302605]  br_nf_forward_finish+0x118/0x22c [br_netfilter]
[  176.302824]  br_nf_forward_ip.part.0+0x264/0x290 [br_netfilter]
[  176.303136]  br_nf_forward+0x2b8/0x4e0 [br_netfilter]
[  176.303359]  nf_hook_slow+0x48/0x124
[  176.303803]  __br_forward+0xc4/0x194 [bridge]
[  176.304013]  br_flood+0xd4/0x168 [bridge]
[  176.304300]  br_handle_frame_finish+0x1d4/0x5c4 [bridge]
[  176.304536]  br_nf_hook_thresh+0x124/0x134 [br_netfilter]
[  176.304978]  br_nf_pre_routing_finish+0x29c/0x494 [br_netfilter]
[  176.305188]  br_nf_pre_routing+0x250/0x524 [br_netfilter]
[  176.305428]  br_handle_frame+0x244/0x3cc [bridge]
[  176.305695]  __netif_receive_skb_core.constprop.0+0x33c/0xecc
[  176.306080]  __netif_receive_skb_one_core+0x40/0x8c
[  176.306197]  __netif_receive_skb+0x18/0x64
[  176.306369]  process_backlog+0x80/0x124
[  176.306540]  __napi_poll+0x38/0x17c
[  176.306636]  net_rx_action+0x124/0x26c
[  176.306758]  __do_softirq+0x100/0x26c
[  176.307051]  ____do_softirq+0x10/0x1c
[  176.307162]  call_on_irq_stack+0x24/0x4c
[  176.307289]  do_softirq_own_stack+0x1c/0x2c
[  176.307396]  do_softirq+0x54/0x6c
[  176.307485]  __local_bh_enable_ip+0x8c/0x98
[  176.307637]  __dev_queue_xmit+0x22c/0xd28
[  176.307775]  neigh_resolve_output+0xf4/0x1a0
[  176.308018]  ip_finish_output2+0x1c8/0x628
[  176.308137]  ip_do_fragment+0x5b4/0x658
[  176.308279]  ip_fragment.constprop.0+0x48/0xec
[  176.308420]  __ip_finish_output+0xa4/0x254
[  176.308593]  ip_finish_output+0x34/0x130
[  176.308814]  ip_output+0x6c/0x108
[  176.308929]  ip_send_skb+0x50/0xf0
[  176.309095]  ip_push_pending_frames+0x30/0x54
[  176.309254]  raw_sendmsg+0x758/0xaec
[  176.309568]  inet_sendmsg+0x44/0x70
[  176.309667]  __sys_sendto+0x110/0x178
[  176.309758]  __arm64_sys_sendto+0x28/0x38
[  176.309918]  invoke_syscall+0x48/0x110
[  176.310211]  el0_svc_common.constprop.0+0x40/0xe0
[  176.310353]  do_el0_svc+0x1c/0x28
[  176.310434]  el0_svc+0x34/0xb4
[  176.310551]  el0t_64_sync_handler+0x120/0x12c
[  176.310690]  el0t_64_sync+0x190/0x194
[  176.311066] Code: f9402e61 79402aa2 927ff821 f9400023 (f9408860)
[  176.315743] ---[ end trace 0000000000000000 ]---
[  176.316060] Kernel panic - not syncing: Oops: Fatal exception in
interrupt
[  176.316371] Kernel Offset: 0x37e0e3000000 from 0xffff800080000000
[  176.316564] PHYS_OFFSET: 0xffff97d780000000
[  176.316782] CPU features: 0x0,88000203,3c020000,0100421b
[  176.317210] Memory Limit: none
[  176.317527] ---[ end Kernel panic - not syncing: Oops: Fatal
Exception in interrupt ]---\

Fixes: 11538d039a ("bridge: vlan dst_metadata hooks in ingress and egress paths")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Andy Roulin <aroulin@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241001154400.22787-2-aroulin@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f07131239a76cc10d5e82c19d91f53cb55727297)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:46 +05:30
Neal Cardwell
e5b4018d59 tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe
[ Upstream commit b41b4cbd9655bcebcce941bef3601db8110335be ]

Fix tcp_enter_recovery() so that if there are no retransmits out then
we zero retrans_stamp when entering fast recovery. This is necessary
to fix two buggy behaviors.

Currently a non-zero retrans_stamp value can persist across multiple
back-to-back loss recovery episodes. This is because we generally only
clears retrans_stamp if we are completely done with loss recoveries,
and get to tcp_try_to_open() and find !tcp_any_retrans_done(sk). This
behavior causes two bugs:

(1) When a loss recovery episode (CA_Loss or CA_Recovery) is followed
immediately by a new CA_Recovery, the retrans_stamp value can persist
and can be a time before this new CA_Recovery episode starts. That
means that timestamp-based undo will be using the wrong retrans_stamp
(a value that is too old) when comparing incoming TS ecr values to
retrans_stamp to see if the current fast recovery episode can be
undone.

(2) If there is a roughly minutes-long sequence of back-to-back fast
recovery episodes, one after another (e.g. in a shallow-buffered or
policed bottleneck), where each fast recovery successfully makes
forward progress and recovers one window of sequence space (but leaves
at least one retransmit in flight at the end of the recovery),
followed by several RTOs, then the ETIMEDOUT check may be using the
wrong retrans_stamp (a value set at the start of the first fast
recovery in the sequence). This can cause a very premature ETIMEDOUT,
killing the connection prematurely.

This commit changes the code to zero retrans_stamp when entering fast
recovery, when this is known to be safe (no retransmits are out in the
network). That ensures that when starting a fast recovery episode, and
it is safe to do so, retrans_stamp is set when we send the fast
retransmit packet. That addresses both bug (1) and bug (2) by ensuring
that (if no retransmits are out when we start a fast recovery) we use
the initial fast retransmit of this fast recovery as the time value
for undo and ETIMEDOUT calculations.

This makes intuitive sense, since the start of a new fast recovery
episode (in a scenario where no lost packets are out in the network)
means that the connection has made forward progress since the last RTO
or fast recovery, and we should thus "restart the clock" used for both
undo and ETIMEDOUT logic.

Note that if when we start fast recovery there *are* retransmits out
in the network, there can still be undesirable (1)/(2) issues. For
example, after this patch we can still have the (1) and (2) problems
in cases like this:

+ round 1: sender sends flight 1

+ round 2: sender receives SACKs and enters fast recovery 1,
  retransmits some packets in flight 1 and then sends some new data as
  flight 2

+ round 3: sender receives some SACKs for flight 2, notes losses, and
  retransmits some packets to fill the holes in flight 2

+ fast recovery has some lost retransmits in flight 1 and continues
  for one or more rounds sending retransmits for flight 1 and flight 2

+ fast recovery 1 completes when snd_una reaches high_seq at end of
  flight 1

+ there are still holes in the SACK scoreboard in flight 2, so we
  enter fast recovery 2, but some retransmits in the flight 2 sequence
  range are still in flight (retrans_out > 0), so we can't execute the
  new retrans_stamp=0 added here to clear retrans_stamp

It's not yet clear how to fix these remaining (1)/(2) issues in an
efficient way without breaking undo behavior, given that retrans_stamp
is currently used for undo and ETIMEDOUT. Perhaps the optimal (but
expensive) strategy would be to set retrans_stamp to the timestamp of
the earliest outstanding retransmit when entering fast recovery. But
at least this commit makes things better.

Note that this does not change the semantics of retrans_stamp; it
simply makes retrans_stamp accurate in some cases where it was not
before:

(1) Some loss recovery, followed by an immediate entry into a fast
recovery, where there are no retransmits out when entering the fast
recovery.

(2) When a TFO server has a SYNACK retransmit that sets retrans_stamp,
and then the ACK that completes the 3-way handshake has SACK blocks
that trigger a fast recovery. In this case when entering fast recovery
we want to zero out the retrans_stamp from the TFO SYNACK retransmit,
and set the retrans_stamp based on the timestamp of the fast recovery.

We introduce a tcp_retrans_stamp_cleanup() helper, because this
two-line sequence already appears in 3 places and is about to appear
in 2 more as a result of this bug fix patch series. Once this bug fix
patches series in the net branch makes it into the net-next branch
we'll update the 3 other call sites to use the new helper.

This is a long-standing issue. The Fixes tag below is chosen to be the
oldest commit at which the patch will apply cleanly, which is from
Linux v3.5 in 2012.

Fixes: 1fbc340514 ("tcp: early retransmit: tcp_enter_recovery()")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241001200517.2756803-3-ncardwell.sw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a58878d7106b229a2d91a647629a0a7bedccaa8a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:42 +05:30
Andrey Shumilin
2209c1cf23 fbdev: sisfb: Fix strbuf array overflow
[ Upstream commit 9cf14f5a2746c19455ce9cb44341b5527b5e19c3 ]

The values of the variables xres and yres are placed in strbuf.
These variables are obtained from strbuf1.
The strbuf1 array contains digit characters
and a space if the array contains non-digit characters.
Then, when executing sprintf(strbuf, "%ux%ux8", xres, yres);
more than 16 bytes will be written to strbuf.
It is suggested to increase the size of the strbuf array to 24.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 433c84c8495008922534c5cafdae6ff970fb3241)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:15 +05:30
Zijun Hu
6dea1514bc driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute
[ Upstream commit c0fd973c108cdc22a384854bc4b3e288a9717bb2 ]

Return -EIO instead of 0 for below erroneous bus attribute operations:
 - read a bus attribute without show().
 - write a bus attribute without store().

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20240724-bus_fix-v2-1-5adbafc698fb@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit aca863154863d0a97305a089399cee1d39e852da)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:14 +05:30
Zhu Jun
691c3e9284 tools/iio: Add memory allocation failure check for trigger_name
[ Upstream commit 3c6b818b097dd6932859bcc3d6722a74ec5931c1 ]

Added a check to handle memory allocation failure for `trigger_name`
and return `-ENOMEM`.

Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Link: https://patch.msgid.link/20240828093129.3040-1-zhujun2@cmss.chinamobile.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e0daff560940b0d370d4328b9ff9294b7f893daa)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:13 +05:30
Xu Yang
36fe41a3c2 usb: chipidea: udc: enable suspend interrupt after usb reset
[ Upstream commit e4fdcc10092fb244218013bfe8ff01c55d54e8e4 ]

Currently, suspend interrupt is enabled before pullup enable operation.
This will cause a suspend interrupt assert right after pullup DP. This
suspend interrupt is meaningless, so this will ignore such interrupt
by enable it after usb reset completed.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Acked-by: Peter Chen <peter.chen@kernel.org>
Link: https://lore.kernel.org/r/20240823073832.1702135-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 93233aa73b3ac373ffd4dd9e6fb7217a8051b760)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:12 +05:30
Yunke Cao
6292d9f5d5 media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put()
[ Upstream commit 6a9c97ab6b7e85697e0b74e86062192a5ffffd99 ]

Clear vb2_plane's memory related fields in __vb2_plane_dmabuf_put(),
including bytesused, length, fd and data_offset.

Remove the duplicated code in __prepare_dmabuf().

Signed-off-by: Yunke Cao <yunkec@chromium.org>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 940e83f377cb3863bd5a4e483ef1b228fbc86812)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:11 +05:30
Hans de Goede
10d39c01e2 i2c: i801: Use a different adapter-name for IDF adapters
[ Upstream commit 43457ada98c824f310adb7bd96bd5f2fcd9a3279 ]

On chipsets with a second 'Integrated Device Function' SMBus controller use
a different adapter-name for the second IDF adapter.

This allows platform glue code which is looking for the primary i801
adapter to manually instantiate i2c_clients on to differentiate
between the 2.

This allows such code to find the primary i801 adapter by name, without
needing to duplicate the PCI-ids to feature-flags mapping from i2c-i801.c.

Reviewed-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a2eb6e5a03de2ecbba68384c1c8f2a34c89ed7b8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:10 +05:30
Krzysztof Kozlowski
953d353912 clk: bcm: bcm53573: fix OF node leak in init
[ Upstream commit f92d67e23b8caa81f6322a2bad1d633b00ca000e ]

Driver code is leaking OF node reference from of_get_parent() in
bcm53573_ilp_init().  Usage of of_get_parent() is not needed in the
first place, because the parent node will not be freed while we are
processing given node (triggered by CLK_OF_DECLARE()).  Thus fix the
leak by accessing parent directly, instead of of_get_parent().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240826065801.17081-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8ac316aed34fa1a49ebbaa93465bf8bfe73e9937)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:09 +05:30
Wojciech Gładysz
f0cd507feb ext4: nested locking for xattr inode
[ Upstream commit d1bc560e9a9c78d0b2314692847fc8661e0aeb99 ]

Add nested locking with I_MUTEX_XATTR subclass to avoid lockdep warning
while handling xattr inode on file open syscall at ext4_xattr_inode_iget.

Backtrace
EXT4-fs (loop0): Ignoring removed oldalloc option
======================================================
WARNING: possible circular locking dependency detected
5.10.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor543/2794 is trying to acquire lock:
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425

but task is already holding lock:
ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&ei->i_data_sem/3){++++}-{3:3}:
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       ext4_update_i_disksize fs/ext4/ext4.h:3267 [inline]
       ext4_xattr_inode_write fs/ext4/xattr.c:1390 [inline]
       ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1538 [inline]
       ext4_xattr_set_entry+0x331a/0x3d80 fs/ext4/xattr.c:1662
       ext4_xattr_ibody_set+0x124/0x390 fs/ext4/xattr.c:2228
       ext4_xattr_set_handle+0xc27/0x14e0 fs/ext4/xattr.c:2385
       ext4_xattr_set+0x219/0x390 fs/ext4/xattr.c:2498
       ext4_xattr_user_set+0xc9/0xf0 fs/ext4/xattr_user.c:40
       __vfs_setxattr+0x404/0x450 fs/xattr.c:177
       __vfs_setxattr_noperm+0x11d/0x4f0 fs/xattr.c:208
       __vfs_setxattr_locked+0x1f9/0x210 fs/xattr.c:266
       vfs_setxattr+0x112/0x2c0 fs/xattr.c:283
       setxattr+0x1db/0x3e0 fs/xattr.c:548
       path_setxattr+0x15a/0x240 fs/xattr.c:567
       __do_sys_setxattr fs/xattr.c:582 [inline]
       __se_sys_setxattr fs/xattr.c:578 [inline]
       __x64_sys_setxattr+0xc5/0xe0 fs/xattr.c:578
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

-> #0 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:2988 [inline]
       check_prevs_add kernel/locking/lockdep.c:3113 [inline]
       validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
       __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
       lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
       down_write+0x93/0x180 kernel/locking/rwsem.c:1564
       inode_lock include/linux/fs.h:782 [inline]
       ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
       ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
       ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
       ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
       ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
       __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
       ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
       __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
       ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
       notify_change+0xbb6/0xe60 fs/attr.c:435
       do_truncate+0x1de/0x2c0 fs/open.c:64
       handle_truncate fs/namei.c:2970 [inline]
       do_open fs/namei.c:3311 [inline]
       path_openat+0x29f3/0x3290 fs/namei.c:3425
       do_filp_open+0x20b/0x450 fs/namei.c:3452
       do_sys_openat2+0x124/0x460 fs/open.c:1207
       do_sys_open fs/open.c:1223 [inline]
       __do_sys_open fs/open.c:1231 [inline]
       __se_sys_open fs/open.c:1227 [inline]
       __x64_sys_open+0x221/0x270 fs/open.c:1227
       do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
       entry_SYSCALL_64_after_hwframe+0x61/0xcb

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem/3);
                               lock(&ea_inode->i_rwsem#7/1);
                               lock(&ei->i_data_sem/3);
  lock(&ea_inode->i_rwsem#7/1);

 *** DEADLOCK ***

5 locks held by syz-executor543/2794:
 #0: ffff888026fbc448 (sb_writers#4){.+.+}-{0:0}, at: mnt_want_write+0x4a/0x2a0 fs/namespace.c:365
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline]
 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: do_truncate+0x1cf/0x2c0 fs/open.c:62
 #2: ffff8880215e3310 (&ei->i_mmap_sem){++++}-{3:3}, at: ext4_setattr+0xec4/0x19c0 fs/ext4/inode.c:5519
 #3: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr fs/ext4/xattr.h:162 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize fs/ext4/inode.c:5938 [inline]
 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4fb/0x810 fs/ext4/inode.c:6018

stack backtrace:
CPU: 1 PID: 2794 Comm: syz-executor543 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x177/0x211 lib/dump_stack.c:118
 print_circular_bug+0x146/0x1b0 kernel/locking/lockdep.c:2002
 check_noncircular+0x2cc/0x390 kernel/locking/lockdep.c:2123
 check_prev_add kernel/locking/lockdep.c:2988 [inline]
 check_prevs_add kernel/locking/lockdep.c:3113 [inline]
 validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729
 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955
 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566
 down_write+0x93/0x180 kernel/locking/rwsem.c:1564
 inode_lock include/linux/fs.h:782 [inline]
 ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425
 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485
 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline]
 ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline]
 ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774
 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898
 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline]
 __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018
 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562
 notify_change+0xbb6/0xe60 fs/attr.c:435
 do_truncate+0x1de/0x2c0 fs/open.c:64
 handle_truncate fs/namei.c:2970 [inline]
 do_open fs/namei.c:3311 [inline]
 path_openat+0x29f3/0x3290 fs/namei.c:3425
 do_filp_open+0x20b/0x450 fs/namei.c:3452
 do_sys_openat2+0x124/0x460 fs/open.c:1207
 do_sys_open fs/open.c:1223 [inline]
 __do_sys_open fs/open.c:1231 [inline]
 __se_sys_open fs/open.c:1227 [inline]
 __x64_sys_open+0x221/0x270 fs/open.c:1227
 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f0cde4ea229
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd81d1c978 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 0030656c69662f30 RCX: 00007f0cde4ea229
RDX: 0000000000000089 RSI: 00000000000a0a00 RDI: 00000000200001c0
RBP: 2f30656c69662f2e R08: 0000000000208000 R09: 0000000000208000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd81d1c9c0
R13: 00007ffd81d1ca00 R14: 0000000000080000 R15: 0000000000000003
EXT4-fs error (device loop0): ext4_expand_extra_isize_ea:2730: inode #13: comm syz-executor543: corrupted in-inode xattr

Signed-off-by: Wojciech Gładysz <wojciech.gladysz@infogain.com>
Link: https://patch.msgid.link/20240801143827.19135-1-wojciech.gladysz@infogain.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c0f57dd0f1603ae27ef694bacde66147f9d57d32)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:08 +05:30
Gerald Schaefer
4335755845 s390/mm: Add cond_resched() to cmm_alloc/free_pages()
[ Upstream commit 131b8db78558120f58c5dc745ea9655f6b854162 ]

Adding/removing large amount of pages at once to/from the CMM balloon
can result in rcu_sched stalls or workqueue lockups, because of busy
looping w/o cond_resched().

Prevent this by adding a cond_resched(). cmm_free_pages() holds a
spin_lock while looping, so it cannot be added directly to the existing
loop. Instead, introduce a wrapper function that operates on maximum 256
pages at once, and add it there.

Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a12b82d741350b89b4df55fa8a4e5c0579d919cb)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:07 +05:30
Heiko Carstens
5c21008c9e s390/facility: Disable compile time optimization for decompressor code
[ Upstream commit 0147addc4fb72a39448b8873d8acdf3a0f29aa65 ]

Disable compile time optimizations of test_facility() for the
decompressor. The decompressor should not contain any optimized code
depending on the architecture level set the kernel image is compiled
for to avoid unexpected operation exceptions.

Add a __DECOMPRESSOR check to test_facility() to enforce that
facilities are always checked during runtime for the decompressor.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f559306a168fb92a936beaa1f020f5c45cdedac6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:06 +05:30
Michael S. Tsirkin
bb6709f54b virtio_console: fix misc probe bugs
[ Upstream commit b9efbe2b8f0177fa97bfab290d60858900aa196b ]

This fixes the following issue discovered by code review:

after vqs have been created, a buggy device can send an interrupt.

A control vq callback will then try to schedule control_work which has
not been initialized yet. Similarly for config interrupt.  Further, in
and out vq callbacks invoke find_port_by_vq which attempts to take
ports_lock which also has not been initialized.

To fix, init all locks and work before creating vqs.

Message-ID: <ad982e975a6160ad110c623c016041311ca15b4f.1726511547.git.mst@redhat.com>
Fixes: 17634ba255 ("virtio: console: Add a new MULTIPORT feature, support for generic ports")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 42a7c0fd6e5b7c5db8af8ab2bab6eff2a723b168)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:37:04 +05:30
zhanchengbin
1345d45a98 ext4: fix inode tree inconsistency caused by ENOMEM
commit 3f5424790d4377839093b68c12b130077a4e4510 upstream.

If ENOMEM fails when the extent is splitting, we need to restore the length
of the split extent.
In the ext4_split_extent_at function, only in ext4_ext_create_new_leaf will
it alloc memory and change the shape of the extent tree,even if an ENOMEM
is returned at this time, the extent tree is still self-consistent, Just
restore the split extent lens in the function ext4_split_extent_at.

ext4_split_extent_at
 ext4_ext_insert_extent
  ext4_ext_create_new_leaf
   1)ext4_ext_split
     ext4_find_extent
   2)ext4_ext_grow_indepth
     ext4_find_extent

Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230103022812.130603-1-zhanchengbin1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit eea5a4e7fe4424245aeba77bb0f24a38a1bead16)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:56 +05:30
NeilBrown
a44da2111b nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
[ Upstream commit 45bb63ed20e02ae146336412889fe5450316a84f ]

The pair of bloom filtered used by delegation_blocked() was intended to
block delegations on given filehandles for between 30 and 60 seconds.  A
new filehandle would be recorded in the "new" bit set.  That would then
be switch to the "old" bit set between 0 and 30 seconds later, and it
would remain as the "old" bit set for 30 seconds.

Unfortunately the code intended to clear the old bit set once it reached
30 seconds old, preparing it to be the next new bit set, instead cleared
the *new* bit set before switching it to be the old bit set.  This means
that the "old" bit set is always empty and delegations are blocked
between 0 and 30 seconds.

This patch updates bd->new before clearing the set with that index,
instead of afterwards.

Reported-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 6282cd5655 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ccbd18223985635b8dbb1393bacac9e1a5fa3f2f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:36 +05:30
Arnd Bergmann
e5411c5a95 nfsd: use ktime_get_seconds() for timestamps
[ Upstream commit b3f255ef6bffc18a28c3b6295357f2a3380c033f ]

The delegation logic in nfsd uses the somewhat inefficient
seconds_since_boot() function to record time intervals.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stable-dep-of: 45bb63ed20e0 ("nfsd: fix delegation_blocked() to block correctly for at least 30 seconds")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f81fcf39509d30cb5f1c659099c1d8f0c2a9a57a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:35 +05:30
Oleg Nesterov
699d7f5ed5 uprobes: fix kernel info leak via "[uprobes]" vma
commit 34820304cc2cd1804ee1f8f3504ec77813d29c8e upstream.

xol_add_vma() maps the uninitialized page allocated by __create_xol_area()
into userspace. On some architectures (x86) this memory is readable even
without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ,
although this doesn't really matter, debugger can read this memory anyway.

Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/

Reported-by: Will Deacon <will@kernel.org>
Fixes: d4b3b6384f ("uprobes/core: Allocate XOL slots for uprobes use")
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f31f92107e5a8ecc8902705122c594e979a351fe)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:34 +05:30
Emanuele Ghidoli
66f7a6eae4 gpio: davinci: fix lazy disable
commit 3360d41f4ac490282fddc3ccc0b58679aa5c065d upstream.

On a few platforms such as TI's AM69 device, disable_irq() fails to keep
track of the interrupts that happen between disable_irq() and
enable_irq() and those interrupts are missed. Use the ->irq_unmask() and
->irq_mask() methods instead of ->irq_enable() and ->irq_disable() to
correctly keep track of edges when disable_irq is called.

This solves the issue of disable_irq() not working as expected on such
platforms.

Fixes: 23265442b0 ("ARM: davinci: irq_data conversion.")
Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com>
Signed-off-by: Parth Pancholi <parth.pancholi@toradex.com>
Acked-by: Keerthy <j-keerthy@ti.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240828133207.493961-1-parth105105@gmail.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e9b751c0d7abde1837ee1510cbdc705570107ef1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:01 +05:30
Filipe Manana
3248072d06 btrfs: wait for fixup workers before stopping cleaner kthread during umount
commit 41fd1e94066a815a7ab0a7025359e9b40e4b3576 upstream.

During unmount, at close_ctree(), we have the following steps in this order:

1) Park the cleaner kthread - this doesn't destroy the kthread, it basically
   halts its execution (wake ups against it work but do nothing);

2) We stop the cleaner kthread - this results in freeing the respective
   struct task_struct;

3) We call btrfs_stop_all_workers() which waits for any jobs running in all
   the work queues and then free the work queues.

Syzbot reported a case where a fixup worker resulted in a crash when doing
a delayed iput on its inode while attempting to wake up the cleaner at
btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread
was already freed. This can happen during unmount because we don't wait
for any fixup workers still running before we call kthread_stop() against
the cleaner kthread, which stops and free all its resources.

Fix this by waiting for any fixup workers at close_ctree() before we call
kthread_stop() against the cleaner and run pending delayed iputs.

The stack traces reported by syzbot were the following:

  BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
  Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52

  CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  Workqueue: btrfs-fixup btrfs_work_helper
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0x169/0x550 mm/kasan/report.c:488
   kasan_report+0x143/0x180 mm/kasan/report.c:601
   __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
   class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
   try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154
   btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842
   btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314
   process_one_work kernel/workqueue.c:3229 [inline]
   process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
   worker_thread+0x870/0xd30 kernel/workqueue.c:3391
   kthread+0x2f0/0x390 kernel/kthread.c:389
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
   </TASK>

  Allocated by task 2:
   kasan_save_stack mm/kasan/common.c:47 [inline]
   kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
   unpoison_slab_object mm/kasan/common.c:319 [inline]
   __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4086 [inline]
   slab_alloc_node mm/slub.c:4135 [inline]
   kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187
   alloc_task_struct_node kernel/fork.c:180 [inline]
   dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
   copy_process+0x5d1/0x3d50 kernel/fork.c:2206
   kernel_clone+0x223/0x880 kernel/fork.c:2787
   kernel_thread+0x1bc/0x240 kernel/fork.c:2849
   create_kthread kernel/kthread.c:412 [inline]
   kthreadd+0x60d/0x810 kernel/kthread.c:765
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Freed by task 61:
   kasan_save_stack mm/kasan/common.c:47 [inline]
   kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
   kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
   poison_slab_object mm/kasan/common.c:247 [inline]
   __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
   kasan_slab_free include/linux/kasan.h:230 [inline]
   slab_free_hook mm/slub.c:2343 [inline]
   slab_free mm/slub.c:4580 [inline]
   kmem_cache_free+0x1a2/0x420 mm/slub.c:4682
   put_task_struct include/linux/sched/task.h:144 [inline]
   delayed_put_task_struct+0x125/0x300 kernel/exit.c:228
   rcu_do_batch kernel/rcu/tree.c:2567 [inline]
   rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
   handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
   __do_softirq kernel/softirq.c:588 [inline]
   invoke_softirq kernel/softirq.c:428 [inline]
   __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
   irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
   instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline]
   sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037
   asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702

  Last potentially related work creation:
   kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
   __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
   __call_rcu_common kernel/rcu/tree.c:3086 [inline]
   call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190
   context_switch kernel/sched/core.c:5318 [inline]
   __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675
   schedule_idle+0x56/0x90 kernel/sched/core.c:6793
   do_idle+0x56a/0x5d0 kernel/sched/idle.c:354
   cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424
   start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314
   common_startup_64+0x13e/0x147

  The buggy address belongs to the object at ffff8880272a8000
   which belongs to the cache task_struct of size 7424
  The buggy address is located 2584 bytes inside of
   freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00)

  The buggy address belongs to the physical page:
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8
  head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
  page_type: f5(slab)
  raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
  head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
  head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
  head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000
  head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  page_owner tracks the page as allocated
  page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
   alloc_slab_page+0x6a/0x120 mm/slub.c:2413
   allocate_slab+0x5a/0x2f0 mm/slub.c:2579
   new_slab mm/slub.c:2632 [inline]
   ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819
   __slab_alloc+0x58/0xa0 mm/slub.c:3909
   __slab_alloc_node mm/slub.c:3962 [inline]
   slab_alloc_node mm/slub.c:4123 [inline]
   kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187
   alloc_task_struct_node kernel/fork.c:180 [inline]
   dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
   copy_process+0x5d1/0x3d50 kernel/fork.c:2206
   kernel_clone+0x223/0x880 kernel/fork.c:2787
   kernel_thread+0x1bc/0x240 kernel/fork.c:2849
   create_kthread kernel/kthread.c:412 [inline]
   kthreadd+0x60d/0x810 kernel/kthread.c:765
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
  page last free pid 5230 tgid 5230 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638
   discard_slab mm/slub.c:2678 [inline]
   __put_partials+0xeb/0x130 mm/slub.c:3146
   put_cpu_partial+0x17c/0x250 mm/slub.c:3221
   __slab_free+0x2ea/0x3d0 mm/slub.c:4450
   qlink_free mm/kasan/quarantine.c:163 [inline]
   qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
   kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
   __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4086 [inline]
   slab_alloc_node mm/slub.c:4135 [inline]
   kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142
   getname_flags+0xb7/0x540 fs/namei.c:139
   do_sys_openat2+0xd2/0x1d0 fs/open.c:1409
   do_sys_open fs/open.c:1430 [inline]
   __do_sys_openat fs/open.c:1446 [inline]
   __se_sys_openat fs/open.c:1441 [inline]
   __x64_sys_openat+0x247/0x2a0 fs/open.c:1441
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Memory state around the buggy address:
   ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                              ^
   ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

Reported-by: syzbot+8aaf2df2ef0164ffe1fb@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/66fb36b1.050a0220.aab67.003b.GAE@google.com/
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cd686dfff63f27d712877aef5b962fbf6b8bc264)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:36:00 +05:30
Nuno Sa
f86cf9f162 Input: adp5589-keys - fix adp5589_gpio_get_value()
commit c684771630e64bc39bddffeb65dd8a6612a6b249 upstream.

The adp5589 seems to have the same behavior as similar devices as
explained in commit 910a9f5636 ("Input: adp5588-keys - get value from
data out when dir is out").

Basically, when the gpio is set as output we need to get the value from
ADP5589_GPO_DATA_OUT_A register instead of ADP5589_GPI_STATUS_A.

Fixes: 9d2e173644 ("Input: ADP5589 - new driver for I2C Keypad Decoder and I/O Expander")
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20241001-b4-dev-adp5589-fw-conversion-v1-2-fca0149dfc47@analog.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9ff7ae486d51c0da706a29b116d7fa399db677f5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:59 +05:30
Tetsuo Handa
fbd92b0de9 tomoyo: fallback to realpath if symlink's pathname does not exist
commit ada1986d07976d60bed5017aa38b7f7cf27883f7 upstream.

Alfred Agrell found that TOMOYO cannot handle execveat(AT_EMPTY_PATH)
inside chroot environment where /dev and /proc are not mounted, for
commit 51f39a1f0c ("syscalls: implement execveat() system call") missed
that TOMOYO tries to canonicalize argv[0] when the filename fed to the
executed program as argv[0] is supplied using potentially nonexistent
pathname.

Since "/dev/fd/<fd>" already lost symlink information used for obtaining
that <fd>, it is too late to reconstruct symlink's pathname. Although
<filename> part of "/dev/fd/<fd>/<filename>" might not be canonicalized,
TOMOYO cannot use tomoyo_realpath_nofollow() when /dev or /proc is not
mounted. Therefore, fallback to tomoyo_realpath_from_path() when
tomoyo_realpath_nofollow() failed.

Reported-by: Alfred Agrell <blubban@gmail.com>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1082001
Fixes: 51f39a1f0c ("syscalls: implement execveat() system call")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 455246846468503ac739924d5b63af32c6261b31)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:58 +05:30
Barnabás Czémán
5cb7996e83 iio: magnetometer: ak8975: Fix reading for ak099xx sensors
commit 129464e86c7445a858b790ac2d28d35f58256bbe upstream.

Move ST2 reading with overflow handling after measurement data
reading.
ST2 register read have to be read after read measurment data,
because it means end of the reading and realease the lock on the data.
Remove ST2 read skip on interrupt based waiting because ST2 required to
be read out at and of the axis read.

Fixes: 57e73a423b ("iio: ak8975: add ak09911 and ak09912 support")
Signed-off-by: Barnabás Czémán <barnabas.czeman@mainlining.org>
Link: https://patch.msgid.link/20240819-ak09918-v4-2-f0734d14cfb9@mainlining.org
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2e78095a0cc35d6210de051accb2fe45649087cd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:57 +05:30
Zheng Wang
6f292cfada media: venus: fix use after free bug in venus_remove due to race condition
commit c5a85ed88e043474161bbfe54002c89c1cb50ee2 upstream.

in venus_probe, core->work is bound with venus_sys_error_handler, which is
used to handle error. The code use core->sys_err_done to make sync work.
The core->work is started in venus_event_notify.

If we call venus_remove, there might be an unfished work. The possible
sequence is as follows:

CPU0                  CPU1

                     |venus_sys_error_handler
venus_remove         |
hfi_destroy	 		 |
venus_hfi_destroy	 |
kfree(hdev);	     |
                     |hfi_reinit
					 |venus_hfi_queues_reinit
                     |//use hdev

Fix it by canceling the work in venus_remove.

Cc: stable@vger.kernel.org
Fixes: af2c3834c8 ("[media] media: venus: adding core part and helper functions")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Dikshita Agarwal <quic_dikshita@quicinc.com>
Signed-off-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5098b9e6377577fe13d03e1d8914930f014a3314)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:56 +05:30
Hans Verkuil
c436795548 media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags
commit 599f6899051cb70c4e0aa9fd591b9ee220cb6f14 upstream.

The cec_msg_set_reply_to() helper function never zeroed the
struct cec_msg flags field, this can cause unexpected behavior
if flags was uninitialized to begin with.

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 0dbacebede ("[media] cec: move the CEC framework out of staging and to media")
Cc: <stable@vger.kernel.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4afab2197e530b480c4cc099255d12a08c6a1f93)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:55 +05:30
Sebastian Reichel
b1d2b056ba clk: rockchip: fix error for unknown clocks
commit 12fd64babaca4dc09d072f63eda76ba44119816a upstream.

There is a clk == NULL check after the switch to check for
unsupported clk types. Since clk is re-assigned in a loop,
this check is useless right now for anything but the first
round. Let's fix this up by assigning clk = NULL in the
loop before the switch statement.

Fixes: a245fecbb8 ("clk: rockchip: add basic infrastructure for clock branches")
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
[added fixes + stable-cc]
Link: https://lore.kernel.org/r/20240325193609.237182-6-sebastian.reichel@collabora.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2f1e1a9047b1644d05284fc0da1d6ab9c4434cf6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:54 +05:30
Lizhi Xu
3200b2d416 ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
commit 33b525cef4cff49e216e4133cc48452e11c0391e upstream.

When doing cleanup, if flags without OCFS2_BH_READAHEAD, it may trigger
NULL pointer dereference in the following ocfs2_set_buffer_uptodate() if
bh is NULL.

Link: https://lkml.kernel.org/r/20240902023636.1843422-3-joseph.qi@linux.alibaba.com
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Heming Zhao <heming.zhao@suse.com>
Suggested-by: Heming Zhao <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>	[4.20+]
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 190d98bcd61117a78fe185222d162180f061a6ca)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:53 +05:30
Julian Sun
3c59d3ccdb ocfs2: fix null-ptr-deref when journal load failed.
commit 5784d9fcfd43bd853654bb80c87ef293b9e8e80a upstream.

During the mounting process, if journal_reset() fails because of too short
journal, then lead to jbd2_journal_load() fails with NULL j_sb_buffer.
Subsequently, ocfs2_journal_shutdown() calls
jbd2_journal_flush()->jbd2_cleanup_journal_tail()->
__jbd2_update_log_tail()->jbd2_journal_update_sb_log_tail()
->lock_buffer(journal->j_sb_buffer), resulting in a null-pointer
dereference error.

To resolve this issue, we should check the JBD2_LOADED flag to ensure the
journal was properly loaded.  Additionally, use journal instead of
osb->journal directly to simplify the code.

Link: https://syzkaller.appspot.com/bug?extid=05b9b39d8bdfe1a0861f
Link: https://lkml.kernel.org/r/20240902030844.422725-1-sunjunchao2870@gmail.com
Fixes: f6f50e28f0 ("jbd2: Fail to load a journal if it is too short")
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Reported-by: syzbot+05b9b39d8bdfe1a0861f@syzkaller.appspotmail.com
Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fd89d92c1140cee8f59de336cb37fa65e359c123)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:52 +05:30
Lizhi Xu
4e85375195 ocfs2: remove unreasonable unlock in ocfs2_read_blocks
commit c03a82b4a0c935774afa01fd6d128b444fd930a1 upstream.

Patch series "Misc fixes for ocfs2_read_blocks", v5.

This series contains 2 fixes for ocfs2_read_blocks().  The first patch fix
the issue reported by syzbot, which detects bad unlock balance in
ocfs2_read_blocks().  The second patch fixes an issue reported by Heming
Zhao when reviewing above fix.

This patch (of 2):

There was a lock release before exiting, so remove the unreasonable unlock.

Link: https://lkml.kernel.org/r/20240902023636.1843422-1-joseph.qi@linux.alibaba.com
Link: https://lkml.kernel.org/r/20240902023636.1843422-2-joseph.qi@linux.alibaba.com
Fixes: cf76c78595ca ("ocfs2: don't put and assigning null to bh allocated outside")
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ab134185af9ef88dfed5
Tested-by: syzbot+ab134185af9ef88dfed5@syzkaller.appspotmail.com
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[4.20+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5245f109b4afb6595360d4c180d483a6d2009a59)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:51 +05:30
Joseph Qi
d0e792cbb4 ocfs2: cancel dqi_sync_work before freeing oinfo
commit 35fccce29feb3706f649726d410122dd81b92c18 upstream.

ocfs2_global_read_info() will initialize and schedule dqi_sync_work at the
end, if error occurs after successfully reading global quota, it will
trigger the following warning with CONFIG_DEBUG_OBJECTS_* enabled:

ODEBUG: free active (active state 0) object: 00000000d8b0ce28 object type: timer_list hint: qsync_work_fn+0x0/0x16c

This reports that there is an active delayed work when freeing oinfo in
error handling, so cancel dqi_sync_work first.  BTW, return status instead
of -1 when .read_file_info fails.

Link: https://syzkaller.appspot.com/bug?extid=f7af59df5d6b25f0febd
Link: https://lkml.kernel.org/r/20240904071004.2067695-1-joseph.qi@linux.alibaba.com
Fixes: 171bf93ce1 ("ocfs2: Periodic quota syncing")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Reported-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com
Tested-by: syzbot+f7af59df5d6b25f0febd@syzkaller.appspotmail.com
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fc5cc716dfbdc5fd5f373ff3b51358174cf88bfc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:50 +05:30
Gautham Ananthakrishna
1e7c6cb05a ocfs2: reserve space for inline xattr before attaching reflink tree
commit 5ca60b86f57a4d9648f68418a725b3a7de2816b0 upstream.

One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption.  Upon troubleshooting,
the fsck -fn output showed the below corruption

[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227.  Clamp the next record value? n

The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.

        Inode: 33080590   Mode: 0640   Generation: 2619713622 (0x9c25a856)
        FS Generation: 904309833 (0x35e6ac49)
        CRC32: 00000000   ECC: 0000
        Type: Regular   Attr: 0x0   Flags: Valid
        Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
        Extended Attributes Block: 0  Extended Attributes Inline Size: 256
        User: 0 (root)   Group: 0 (root)   Size: 281320357888
        Links: 1   Clusters: 141738
        ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
        atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
        mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
        dtime: 0x0 -- Wed Dec 31 17:00:00 1969
        Refcount Block: 2777346
        Last Extblk: 2886943   Orphan Slot: 0
        Sub Alloc Slot: 0   Sub Alloc Bit: 14
        Tree Depth: 1   Count: 227   Next Free Rec: 230
        ## Offset        Clusters       Block#
        0  0             2310           2776351
        1  2310          2139           2777375
        2  4449          1221           2778399
        3  5670          731            2779423
        4  6401          566            2780447
        .......          ....           .......
        .......          ....           .......

The issue was in the reflink workfow while reserving space for inline
xattr.  The problematic function is ocfs2_reflink_xattr_inline().  By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode.  At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block.  It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.

The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.

Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@oracle.com
Fixes: ef962df057 ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5c9807c523b4fca81d3e8e864dabc8c806402121)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:49 +05:30
Joseph Qi
10069c65e5 ocfs2: fix uninit-value in ocfs2_get_block()
commit 2af148ef8549a12f8025286b8825c2833ee6bcb8 upstream.

syzbot reported an uninit-value BUG:

BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f

This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized.  So the error log will trigger the above uninit-value
access.

The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.

Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.com
Fixes: ccd979bdbc ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e95da10e6fcac684895c334eca9d95e2fd10b0fe)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:48 +05:30
Heming Zhao
4ad8609a9a ocfs2: fix the la space leak when unmounting an ocfs2 volume
commit dfe6c5692fb525e5e90cefe306ee0dffae13d35f upstream.

This bug has existed since the initial OCFS2 code.  The code logic in
ocfs2_sync_local_to_main() is wrong, as it ignores the last contiguous
free bits, which causes an OCFS2 volume to lose the last free clusters of
LA window on each umount command.

Link: https://lkml.kernel.org/r/20240719114310.14245-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5a074861ae1b6262b50fa9780957db7d17b86672)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:47 +05:30
Baokun Li
0e51b84568 jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error
commit f5cacdc6f2bb2a9bf214469dd7112b43dd2dd68a upstream.

In __jbd2_log_wait_for_space(), we might call jbd2_cleanup_journal_tail()
to recover some journal space. But if an error occurs while executing
jbd2_cleanup_journal_tail() (e.g., an EIO), we don't stop waiting for free
space right away, we try other branches, and if j_committing_transaction
is NULL (i.e., the tid is 0), we will get the following complain:

============================================
JBD2: I/O error when updating journal superblock for sdd-8.
__jbd2_log_wait_for_space: needed 256 blocks and only had 217 space available
__jbd2_log_wait_for_space: no way to get more journal space in sdd-8
------------[ cut here ]------------
WARNING: CPU: 2 PID: 139804 at fs/jbd2/checkpoint.c:109 __jbd2_log_wait_for_space+0x251/0x2e0
Modules linked in:
CPU: 2 PID: 139804 Comm: kworker/u8:3 Not tainted 6.6.0+ #1
RIP: 0010:__jbd2_log_wait_for_space+0x251/0x2e0
Call Trace:
 <TASK>
 add_transaction_credits+0x5d1/0x5e0
 start_this_handle+0x1ef/0x6a0
 jbd2__journal_start+0x18b/0x340
 ext4_dirty_inode+0x5d/0xb0
 __mark_inode_dirty+0xe4/0x5d0
 generic_update_time+0x60/0x70
[...]
============================================

So only if jbd2_cleanup_journal_tail() returns 1, i.e., there is nothing to
clean up at the moment, continue to try to reclaim free space in other ways.

Note that this fix relies on commit 6f6a6fda29 ("jbd2: fix ocfs2 corrupt
when updating journal superblock fails") to make jbd2_cleanup_journal_tail
return the correct error code.

Fixes: 8c3f25d895 ("jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240718115336.2554501-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 801a35dfef6996f3d5eaa96a59caf00440d9165e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:46 +05:30
Andrew Jones
6daa837812 of/irq: Support #msi-cells=<0> in of_msi_get_domain
commit db8e81132cf051843c9a59b46fa5a071c45baeb3 upstream.

An 'msi-parent' property with a single entry and no accompanying
'#msi-cells' property is considered the legacy definition as opposed
to its definition after being expanded with commit 126b16e2ad
("Docs: dt: add generic MSI bindings"). However, the legacy
definition is completely compatible with the current definition and,
since of_phandle_iterator_next() tolerates missing and present-but-
zero *cells properties since commit e42ee61017f5 ("of: Let
of_for_each_phandle fallback to non-negative cell_count"), there's no
need anymore to special case the legacy definition in
of_msi_get_domain().

Indeed, special casing has turned out to be harmful, because, as of
commit 7c025238b47a ("dt-bindings: irqchip: Describe the IMX MU block
as a MSI controller"), MSI controller DT bindings have started
specifying '#msi-cells' as a required property (even when the value
must be zero) as an effort to make the bindings more explicit. But,
since the special casing of 'msi-parent' only uses the existence of
'#msi-cells' for its heuristic, and not whether or not it's also
nonzero, the legacy path is not taken. Furthermore, the path to
support the new, broader definition isn't taken either since that
path has been restricted to the platform-msi bus.

But, neither the definition of 'msi-parent' nor the definition of
'#msi-cells' is platform-msi-specific (the platform-msi bus was just
the first bus that needed '#msi-cells'), so remove both the special
casing and the restriction. The code removal also requires changing
to of_parse_phandle_with_optional_args() in order to ensure the
legacy (but compatible) use of 'msi-parent' remains supported. This
not only simplifies the code but also resolves an issue with PCI
devices finding their MSI controllers on riscv, as the riscv,imsics
binding requires '#msi-cells=<0>'.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240817074107.31153-2-ajones@ventanamicro.com
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 030de6c36c48a40f42d7d59732ee69990340e0a1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:45 +05:30
Luis Henriques (SUSE)
76e097b0d2 ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()
commit dd589b0f1445e1ea1085b98edca6e4d5dedb98d0 upstream.

Function ext4_wait_for_tail_page_commit() assumes that '0' is not a valid
value for transaction IDs, which is incorrect.  Don't assume that and invoke
jbd2_log_wait_commit() if the journal had a committing transaction instead.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-2-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 93fd249f197eeca81bb1c744ac8aec2804afd219)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:44 +05:30
Baokun Li
e346b1e26e ext4: fix double brelse() the buffer of the extents path
commit dcaa6c31134c0f515600111c38ed7750003e1b9c upstream.

In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been
released, otherwise it may be released twice. An example of what triggers
this is as follows:

  split2    map    split1
|--------|-------|--------|

ext4_ext_map_blocks
 ext4_ext_handle_unwritten_extents
  ext4_split_convert_extents
   // path->p_depth == 0
   ext4_split_extent
     // 1. do split1
     ext4_split_extent_at
       |ext4_ext_insert_extent
       |  ext4_ext_create_new_leaf
       |    ext4_ext_grow_indepth
       |      le16_add_cpu(&neh->eh_depth, 1)
       |    ext4_find_extent
       |      // return -ENOMEM
       |// get error and try zeroout
       |path = ext4_find_extent
       |  path->p_depth = 1
       |ext4_ext_try_to_merge
       |  ext4_ext_try_to_merge_up
       |    path->p_depth = 0
       |    brelse(path[1].p_bh)  ---> not set to NULL here
       |// zeroout success
     // 2. update path
     ext4_find_extent
     // 3. do split2
     ext4_split_extent_at
       ext4_ext_insert_extent
         ext4_ext_create_new_leaf
           ext4_ext_grow_indepth
             le16_add_cpu(&neh->eh_depth, 1)
           ext4_find_extent
             path[0].p_bh = NULL;
             path->p_depth = 1
             read_extent_tree_block  ---> return err
             // path[1].p_bh is still the old value
             ext4_free_ext_path
               ext4_ext_drop_refs
                 // path->p_depth == 1
                 brelse(path[1].p_bh)  ---> brelse a buffer twice

Finally got the following WARRNING when removing the buffer from lru:

============================================
VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90
CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716
RIP: 0010:__brelse+0x58/0x90
Call Trace:
 <TASK>
 __find_get_block+0x6e7/0x810
 bdev_getblk+0x2b/0x480
 __ext4_get_inode_loc+0x48a/0x1240
 ext4_get_inode_loc+0xb2/0x150
 ext4_reserve_inode_write+0xb7/0x230
 __ext4_mark_inode_dirty+0x144/0x6a0
 ext4_ext_insert_extent+0x9c8/0x3230
 ext4_ext_map_blocks+0xf45/0x2dc0
 ext4_map_blocks+0x724/0x1700
 ext4_do_writepages+0x12d6/0x2a70
[...]
============================================

Fixes: ecb94f5fdf ("ext4: collapse a single extent tree block into the inode if possible")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d4574bda63906bf69660e001470bfe1a0ac524ae)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:43 +05:30
Baokun Li
c368899dfa ext4: aovid use-after-free in ext4_ext_insert_extent()
commit a164f3a432aae62ca23d03e6d926b122ee5b860d upstream.

As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is
reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and
cause UAF. Below is a sample trace with dummy values:

ext4_ext_insert_extent
  path = *ppath = 2000
  ext4_ext_create_new_leaf(ppath)
    ext4_find_extent(ppath)
      path = *ppath = 2000
      if (depth > path[0].p_maxdepth)
            kfree(path = 2000);
            *ppath = path = NULL;
      path = kcalloc() = 3000
      *ppath = 3000;
      return path;
  /* here path is still 2000, UAF! */
  eh = path[depth].p_hdr

==================================================================
BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330
Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179
CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866
Call Trace:
 <TASK>
 ext4_ext_insert_extent+0x26d4/0x3330
 ext4_ext_map_blocks+0xe22/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
[...]

Allocated by task 179:
 ext4_find_extent+0x81c/0x1f70
 ext4_ext_map_blocks+0x146/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
 ext4_writepages+0x26d/0x4e0
 do_writepages+0x175/0x700
[...]

Freed by task 179:
 kfree+0xcb/0x240
 ext4_find_extent+0x7c0/0x1f70
 ext4_ext_insert_extent+0xa26/0x3330
 ext4_ext_map_blocks+0xe22/0x2d40
 ext4_map_blocks+0x71e/0x1700
 ext4_do_writepages+0x1290/0x2800
 ext4_writepages+0x26d/0x4e0
 do_writepages+0x175/0x700
[...]
==================================================================

So use *ppath to update the path to avoid the above problem.

Reported-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com
Fixes: 10809df84a ("ext4: teach ext4_ext_find_extent() to realloc path if necessary")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e17ebe4fdd7665c93ae9459ba40fcdfb76769ac1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:42 +05:30
Luis Henriques (SUSE)
62d3ea7364 ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()
commit 972090651ee15e51abfb2160e986fa050cfc7a40 upstream.

Function __jbd2_log_wait_for_space() assumes that '0' is not a valid value
for transaction IDs, which is incorrect.  Don't assume that and invoke
jbd2_log_wait_commit() if the journal had a committing transaction instead.

Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240724161119.13448-3-luis.henriques@linux.dev
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 330ecdae721e62cd7ee287fb3cd7f88afa26e85a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:41 +05:30
Baokun Li
5ee7b73dae ext4: propagate errors from ext4_find_extent() in ext4_insert_range()
commit 369c944ed1d7c3fb7b35f24e4735761153afe7b3 upstream.

Even though ext4_find_extent() returns an error, ext4_insert_range() still
returns 0. This may confuse the user as to why fallocate returns success,
but the contents of the file are not as expected. So propagate the error
returned by ext4_find_extent() to avoid inconsistencies.

Fixes: 331573febb ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Link: https://patch.msgid.link/20240822023545.1994557-11-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d38a882fadb0431747342637ad3a9166663e8a86)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:40 +05:30
Edward Adam Davis
7445c15c98 ext4: no need to continue when the number of entries is 1
commit 1a00a393d6a7fb1e745a41edd09019bd6a0ad64c upstream.

Fixes: ac27a0ec11 ("[PATCH] ext4: initial copy of files from ext3")
Reported-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Reported-and-tested-by: syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com
Link: https://patch.msgid.link/tencent_BE7AEE6C7C2D216CB8949CE8E6EE7ECC2C0A@qq.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 64c8c484242b141998f7408596ddb2dc6da4b1d3)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:39 +05:30
Jaroslav Kysela
9980aabb78 ALSA: core: add isascii() check to card ID generator
commit d278a9de5e1837edbe57b2f1f95a104ff6c84846 upstream.

The card identifier should contain only safe ASCII characters. The isalnum()
returns true also for characters for non-ASCII characters.

Link: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4135
Link: https://lore.kernel.org/linux-sound/yk3WTvKkwheOon_LzZlJ43PPInz6byYfBzpKkbasww1yzuiMRqn7n6Y8vZcXB-xwFCu_vb8hoNjv7DTNwH5TWjpEuiVsyn9HPCEXqwF4120=@protonmail.com/
Cc: stable@vger.kernel.org
Reported-by: Barnabás Pőcze <pobrn@protonmail.com>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://patch.msgid.link/20241002194649.1944696-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b9b0efb330f9d2ab082b7f426993d7bac3f2c66)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:38 +05:30
Luo Gengkun
5119d8804f perf/core: Fix small negative period being ignored
commit 62c0b1061593d7012292f781f11145b2d46f43ab upstream.

In perf_adjust_period, we will first calculate period, and then use
this period to calculate delta. However, when delta is less than 0,
there will be a deviation compared to when delta is greater than or
equal to 0. For example, when delta is in the range of [-14,-1], the
range of delta = delta + 7 is between [-7,6], so the final value of
delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
This is unacceptable when the target period is very short, because
we will lose a lot of samples.

Here are some tests and analyzes:
before:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]

  # perf script
  ...
  a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
  a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
  ...

after:
  # perf record -e cs -F 1000  ./a.out
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]

  # perf script
  ...
  a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
  a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
  ...

test.c

int main() {
        for (int i = 0; i < 20000; i++)
                usleep(10);

        return 0;
}

  # time ./a.out
  real    0m1.583s
  user    0m0.040s
  sys     0m0.298s

The above results were tested on x86-64 qemu with KVM enabled using
test.c as test program. Ideally, we should have around 1500 samples,
but the previous algorithm had only about 500, whereas the modified
algorithm now has about 1400. Further more, the new version shows 1
sample per 0.001s, while the previous one is 1 sample per 0.002s.This
indicates that the new algorithm is more sensitive to small negative
values compared to old algorithm.

Fixes: bd2b5b1284 ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20240831074316.2106159-2-luogengkun@huaweicloud.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7fddba7b1bb6f1cc35269e510bc832feb3c54b11)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:37 +05:30
Jinjie Ruan
c62d641022 spi: bcm63xx: Fix module autoloading
commit 909f34f2462a99bf876f64c5c61c653213e32fce upstream.

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
based on the alias from platform_device_id table.

Fixes: 44d8fb3094 ("spi/bcm63xx: move register definitions into the driver")
Cc: stable@vger.kernel.org
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Jonas Gorski <jonas.gorski@gmail.com>
Link: https://patch.msgid.link/20240819123349.4020472-2-ruanjinjie@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 54feac119535e0273730720fe9a4683389f71bff)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:36 +05:30
Robert Hancock
f854ef98bc i2c: xiic: Wait for TX empty to avoid missed TX NAKs
commit 521da1e9225450bd323db5fa5bca942b1dc485b7 upstream.

Frequently an I2C write will be followed by a read, such as a register
address write followed by a read of the register value. In this driver,
when the TX FIFO half empty interrupt was raised and it was determined
that there was enough space in the TX FIFO to send the following read
command, it would do so without waiting for the TX FIFO to actually
empty.

Unfortunately it appears that in some cases this can result in a NAK
that was raised by the target device on the write, such as due to an
unsupported register address, being ignored and the subsequent read
being done anyway. This can potentially put the I2C bus into an
invalid state and/or result in invalid read data being processed.

To avoid this, once a message has been fully written to the TX FIFO,
wait for the TX FIFO empty interrupt before moving on to the next
message, to ensure NAKs are handled properly.

Fixes: e1d5b6598c ("i2c: Add support for Xilinx XPS IIC Bus Interface")
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Cc: <stable@vger.kernel.org> # v2.6.34+
Reviewed-by: Manikanta Guntupalli <manikanta.guntupalli@amd.com>
Acked-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8a6158421b417bb0841c4c7cb7a649707a1089d2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:36 +05:30
Christophe Leroy
68386ffe64 selftests: vDSO: fix vDSO symbols lookup for powerpc64
[ Upstream commit ba83b3239e657469709d15dcea5f9b65bf9dbf34 ]

On powerpc64, following tests fail locating vDSO functions:

  ~ # ./vdso_test_abi
  TAP version 13
  1..16
  # [vDSO kselftest] VDSO_VERSION: LINUX_2.6.15
  # Couldn't find __kernel_gettimeofday
  ok 1 # SKIP __kernel_gettimeofday
  # clock_id: CLOCK_REALTIME
  # Couldn't find __kernel_clock_gettime
  ok 2 # SKIP __kernel_clock_gettime CLOCK_REALTIME
  # Couldn't find __kernel_clock_getres
  ok 3 # SKIP __kernel_clock_getres CLOCK_REALTIME
  ...
  # Couldn't find __kernel_time
  ok 16 # SKIP __kernel_time
  # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:16 error:0

  ~ # ./vdso_test_getrandom
  __kernel_getrandom is missing!

  ~ # ./vdso_test_gettimeofday
  Could not find __kernel_gettimeofday

  ~ # ./vdso_test_getcpu
  Could not find __kernel_getcpu

On powerpc64, as shown below by readelf, vDSO functions symbols have
type NOTYPE, so also accept that type when looking for symbols.

$ powerpc64-linux-gnu-readelf -a arch/powerpc/kernel/vdso/vdso64.so.dbg
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           PowerPC64
  Version:                           0x1
...

Symbol table '.dynsym' contains 12 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000524    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     2: 00000000000005f0    36 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     3: 0000000000000578    68 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     4: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6.15
     5: 00000000000006c0    48 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     6: 0000000000000614   172 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     7: 00000000000006f0    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     8: 000000000000047c    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
     9: 0000000000000454    12 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
    10: 00000000000004d0    84 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15
    11: 00000000000005bc    52 NOTYPE  GLOBAL DEFAULT    8 __[...]@@LINUX_2.6.15

Symbol table '.symtab' contains 56 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
    45: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6.15
    46: 00000000000006c0    48 NOTYPE  GLOBAL DEFAULT    8 __kernel_getcpu
    47: 0000000000000524    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_clock_getres
    48: 00000000000005f0    36 NOTYPE  GLOBAL DEFAULT    8 __kernel_get_tbfreq
    49: 000000000000047c    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_gettimeofday
    50: 0000000000000614   172 NOTYPE  GLOBAL DEFAULT    8 __kernel_sync_dicache
    51: 00000000000006f0    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_getrandom
    52: 0000000000000454    12 NOTYPE  GLOBAL DEFAULT    8 __kernel_sigtram[...]
    53: 0000000000000578    68 NOTYPE  GLOBAL DEFAULT    8 __kernel_time
    54: 00000000000004d0    84 NOTYPE  GLOBAL DEFAULT    8 __kernel_clock_g[...]
    55: 00000000000005bc    52 NOTYPE  GLOBAL DEFAULT    8 __kernel_get_sys[...]

Fixes: 98eedc3a9d ("Document the vDSO and add a reference parser")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 058d587e7f1520934823bae8f41db3c0b1097b59)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:35 +05:30
Yifei Liu
58f3dba206 selftests: breakpoints: use remaining time to check if suspend succeed
[ Upstream commit c66be905cda24fb782b91053b196bd2e966f95b7 ]

step_after_suspend_test fails with device busy error while
writing to /sys/power/state to start suspend. The test believes
it failed to enter suspend state with

$ sudo ./step_after_suspend_test
TAP version 13
Bail out! Failed to enter Suspend state

However, in the kernel message, I indeed see the system get
suspended and then wake up later.

[611172.033108] PM: suspend entry (s2idle)
[611172.044940] Filesystems sync: 0.006 seconds
[611172.052254] Freezing user space processes
[611172.059319] Freezing user space processes completed (elapsed 0.001 seconds)
[611172.067920] OOM killer disabled.
[611172.072465] Freezing remaining freezable tasks
[611172.080332] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[611172.089724] printk: Suspending console(s) (use no_console_suspend to debug)
[611172.117126] serial 00:03: disabled
some other hardware get reconnected
[611203.136277] OOM killer enabled.
[611203.140637] Restarting tasks ...
[611203.141135] usb 1-8.1: USB disconnect, device number 7
[611203.141755] done.
[611203.155268] random: crng reseeded on system resumption
[611203.162059] PM: suspend exit

After investigation, I noticed that for the code block
if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
	ksft_exit_fail_msg("Failed to enter Suspend state\n");

The write will return -1 and errno is set to 16 (device busy).
It should be caused by the write function is not successfully returned
before the system suspend and the return value get messed when waking up.
As a result, It may be better to check the time passed of those few
instructions to determine whether the suspend is executed correctly for
it is pretty hard to execute those few lines for 5 seconds.

The timer to wake up the system is set to expire after 5 seconds and
no re-arm. If the timer remaining time is 0 second and 0 nano secomd,
it means the timer expired and wake the system up. Otherwise, the system
could be considered to enter the suspend state failed if there is any
remaining time.

After appling this patch, the test would not fail for it believes the
system does not go to suspend by mistake. It now could continue to the
rest part of the test after suspend.

Fixes: bfd092b8c2 ("selftests: breakpoint: add step_after_suspend_test")
Reported-by: Sinadin Shan <sinadin.shan@oracle.com>
Signed-off-by: Yifei Liu <yifei.l.liu@oracle.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8dea5ffbd147f6708e2f70f04406d8b711873433)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:34 +05:30
Ben Dooks
db8306c6ac spi: s3c64xx: fix timeout counters in flush_fifo
[ Upstream commit 68a16708d2503b6303d67abd43801e2ca40c208d ]

In the s3c64xx_flush_fifo() code, the loops counter is post-decremented
in the do { } while(test && loops--) condition. This means the loops is
left at the unsigned equivalent of -1 if the loop times out. The test
after will never pass as if tests for loops == 0.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Fixes: 230d42d422 ("spi: Add s3c64xx SPI Controller driver")
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://patch.msgid.link/20240924134009.116247-2-ben.dooks@codethink.co.uk
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 12f47fdd4fb4c4592c9cfad6c21b3855a6bdadb8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:34 +05:30
Artem Sadovnikov
5b9aa67fce ext4: fix i_data_sem unlock order in ext4_ind_migrate()
[ Upstream commit cc749e61c011c255d81b192a822db650c68b313f ]

Fuzzing reports a possible deadlock in jbd2_log_wait_commit.

This issue is triggered when an EXT4_IOC_MIGRATE ioctl is set to require
synchronous updates because the file descriptor is opened with O_SYNC.
This can lead to the jbd2_journal_stop() function calling
jbd2_might_wait_for_commit(), potentially causing a deadlock if the
EXT4_IOC_MIGRATE call races with a write(2) system call.

This problem only arises when CONFIG_PROVE_LOCKING is enabled. In this
case, the jbd2_might_wait_for_commit macro locks jbd2_handle in the
jbd2_journal_stop function while i_data_sem is locked. This triggers
lockdep because the jbd2_journal_start function might also lock the same
jbd2_handle simultaneously.

Found by Linux Verification Center (linuxtesting.org) with syzkaller.

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Co-developed-by: Mikhail Ukhin <mish.uxin2012@yandex.ru>
Signed-off-by: Mikhail Ukhin <mish.uxin2012@yandex.ru>
Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com>
Rule: add
Link: https://lore.kernel.org/stable/20240404095000.5872-1-mish.uxin2012%40yandex.ru
Link: https://patch.msgid.link/20240829152210.2754-1-ancowi69@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4192adefc9c570698821c5eb9873320eac2fcbf1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:33 +05:30
Thadeu Lima de Souza Cascardo
989495abd7 ext4: ext4_search_dir should return a proper error
[ Upstream commit cd69f8f9de280e331c9e6ff689ced0a688a9ce8f ]

ext4_search_dir currently returns -1 in case of a failure, while it returns
0 when the name is not found. In such failure cases, it should return an
error code instead.

This becomes even more important when ext4_find_inline_entry returns an
error code as well in the next commit.

-EFSCORRUPTED seems appropriate as such error code as these failures would
be caused by unexpected record lengths and is in line with other instances
of ext4_check_dir_entry failures.

In the case of ext4_dx_find_entry, the current use of ERR_BAD_DX_DIR was
left as is to reduce the risk of regressions.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-2-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a15514ec9f080fe24ee71edf8b97b49ab9b8fc80)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:35:28 +05:30
Geert Uytterhoeven
c5599e93ce of/irq: Refer to actual buffer size in of_irq_parse_one()
[ Upstream commit 39ab331ab5d377a18fbf5a0e0b228205edfcc7f4 ]

Replace two open-coded calculations of the buffer size by invocations of
sizeof() on the buffer itself, to make sure the code will always use the
actual buffer size.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/817c0b9626fd30790fc488c472a3398324cfcc0c.1724156125.git.geert+renesas@glider.be
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 64bf240f2dfc242d507c7f8404cd9938d61db7cc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:34:19 +05:30
Geert Uytterhoeven
aedf633163 drm/radeon/r100: Handle unknown family in r100_cp_init_microcode()
[ Upstream commit c6dbab46324b1742b50dc2fb5c1fee2c28129439 ]

With -Werror:

    In function ‘r100_cp_init_microcode’,
	inlined from ‘r100_cp_init’ at drivers/gpu/drm/radeon/r100.c:1136:7:
    include/linux/printk.h:465:44: error: ‘%s’ directive argument is null [-Werror=format-overflow=]
      465 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
	  |                                            ^
    include/linux/printk.h:437:17: note: in definition of macro ‘printk_index_wrap’
      437 |                 _p_func(_fmt, ##__VA_ARGS__);                           \
	  |                 ^~~~~~~
    include/linux/printk.h:508:9: note: in expansion of macro ‘printk’
      508 |         printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
	  |         ^~~~~~
    drivers/gpu/drm/radeon/r100.c:1062:17: note: in expansion of macro ‘pr_err’
     1062 |                 pr_err("radeon_cp: Failed to load firmware \"%s\"\n", fw_name);
	  |                 ^~~~~~

Fix this by converting the if/else if/... construct into a proper
switch() statement with a default to handle the error case.

As a bonus, the generated code is ca. 100 bytes smaller (with gcc 11.4.0
targeting arm32).

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7d91358e819a2761a5feff67d902456aaf4e567a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:32:22 +05:30
Zhao Mengmeng
a10e441c5a jfs: Fix uninit-value access of new_ea in ea_buffer
[ Upstream commit 2b59ffad47db1c46af25ccad157bb3b25147c35c ]

syzbot reports that lzo1x_1_do_compress is using uninit-value:

=====================================================
BUG: KMSAN: uninit-value in lzo1x_1_do_compress+0x19f9/0x2510 lib/lzo/lzo1x_compress.c:178

...

Uninit was stored to memory at:
 ea_put fs/jfs/xattr.c:639 [inline]

...

Local variable ea_buf created at:
 __jfs_setxattr+0x5d/0x1ae0 fs/jfs/xattr.c:662
 __jfs_xattr_set+0xe6/0x1f0 fs/jfs/xattr.c:934

=====================================================

The reason is ea_buf->new_ea is not initialized properly.

Fix this by using memset to empty its content at the beginning
in ea_get().

Reported-by: syzbot+02341e0daa42a15ce130@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=02341e0daa42a15ce130
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7b24d41d47a6805c45378debf8bd115675d41da8)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:32:21 +05:30
Edward Adam Davis
d69ee9fa73 jfs: check if leafidx greater than num leaves per dmap tree
[ Upstream commit d64ff0d2306713ff084d4b09f84ed1a8c75ecc32 ]

syzbot report a out of bounds in dbSplit, it because dmt_leafidx greater
than num leaves per dmap tree, add a checking for dmt_leafidx in dbFindLeaf.

Shaggy:
Modified sanity check to apply to control pages as well as leaf pages.

Reported-and-tested-by: syzbot+dca05492eff41f604890@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=dca05492eff41f604890
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d76b9a4c283c7535ae7c7c9b14984e75402951e1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:32:18 +05:30
Edward Adam Davis
9a1253d2f4 jfs: Fix uaf in dbFreeBits
[ Upstream commit d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234 ]

[syzbot reported]
==================================================================
BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:587 [inline]
BUG: KASAN: slab-use-after-free in __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752
Read of size 8 at addr ffff8880229254b0 by task syz-executor357/5216

CPU: 0 UID: 0 PID: 5216 Comm: syz-executor357 Not tainted 6.11.0-rc3-syzkaller-00156-gd7a5aa4b3c00 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 __mutex_lock_common kernel/locking/mutex.c:587 [inline]
 __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752
 dbFreeBits+0x7ea/0xd90 fs/jfs/jfs_dmap.c:2390
 dbFreeDmap fs/jfs/jfs_dmap.c:2089 [inline]
 dbFree+0x35b/0x680 fs/jfs/jfs_dmap.c:409
 dbDiscardAG+0x8a9/0xa20 fs/jfs/jfs_dmap.c:1650
 jfs_ioc_trim+0x433/0x670 fs/jfs/jfs_discard.c:100
 jfs_ioctl+0x2d0/0x3e0 fs/jfs/ioctl.c:131
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83

Freed by task 5218:
 kasan_save_stack mm/kasan/common.c:47 [inline]
 kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
 kasan_slab_free include/linux/kasan.h:184 [inline]
 slab_free_hook mm/slub.c:2252 [inline]
 slab_free mm/slub.c:4473 [inline]
 kfree+0x149/0x360 mm/slub.c:4594
 dbUnmount+0x11d/0x190 fs/jfs/jfs_dmap.c:278
 jfs_mount_rw+0x4ac/0x6a0 fs/jfs/jfs_mount.c:247
 jfs_remount+0x3d1/0x6b0 fs/jfs/super.c:454
 reconfigure_super+0x445/0x880 fs/super.c:1083
 vfs_cmd_reconfigure fs/fsopen.c:263 [inline]
 vfs_fsconfig_locked fs/fsopen.c:292 [inline]
 __do_sys_fsconfig fs/fsopen.c:473 [inline]
 __se_sys_fsconfig+0xb6e/0xf80 fs/fsopen.c:345
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[Analysis]
There are two paths (dbUnmount and jfs_ioc_trim) that generate race
condition when accessing bmap, which leads to the occurrence of uaf.

Use the lock s_umount to synchronize them, in order to avoid uaf caused
by race condition.

Reported-and-tested-by: syzbot+3c010e21296f33a5dc16@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4ac58f7734937f3249da734ede946dfb3b1af5e4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:59 +05:30
Remington Brasga
8c4b8d7a31 jfs: UBSAN: shift-out-of-bounds in dbFindBits
[ Upstream commit b0b2fc815e514221f01384f39fbfbff65d897e1c ]

Fix issue with UBSAN throwing shift-out-of-bounds warning.

Reported-by: syzbot+e38d703eeb410b17b473@syzkaller.appspotmail.com
Signed-off-by: Remington Brasga <rbrasga@uci.edu>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 830d908130d88745f0fd3ed9912cc381edf11ff1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:58 +05:30
Damien Le Moal
120d039873 ata: sata_sil: Rename sil_blacklist to sil_quirks
[ Upstream commit 93b0f9e11ce511353c65b7f924cf5f95bd9c3aba ]

Rename the array sil_blacklist to sil_quirks as this name is more
neutral and is also consistent with how this driver define quirks with
the SIL_QUIRK_XXX flags.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a57a97bb79d5123442068f887e5f1614ed4c752c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:57 +05:30
Andrew Davis
872ead470a power: reset: brcmstb: Do not go into infinite loop if reset fails
[ Upstream commit cf8c39b00e982fa506b16f9d76657838c09150cb ]

There may be other backup reset methods available, do not halt
here so that other reset methods can be tried.

Signed-off-by: Andrew Davis <afd@ti.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240610142836.168603-5-afd@ti.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 61a6d482734804e0a81c3951b8a0d3852085a2cc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:56 +05:30
Kaixin Wang
8ef1410b11 fbdev: pxafb: Fix possible use after free in pxafb_task()
[ Upstream commit 4a6921095eb04a900e0000da83d9475eb958e61e ]

In the pxafb_probe function, it calls the pxafb_init_fbinfo function,
after which &fbi->task is associated with pxafb_task. Moreover,
within this pxafb_init_fbinfo function, the pxafb_blank function
within the &pxafb_ops struct is capable of scheduling work.

If we remove the module which will call pxafb_remove to make cleanup,
it will call unregister_framebuffer function which can call
do_unregister_framebuffer to free fbi->fb through
put_fb_info(fb_info), while the work mentioned above will be used.
The sequence of operations that may lead to a UAF bug is as follows:

CPU0                                                CPU1

                                   | pxafb_task
pxafb_remove                       |
unregister_framebuffer(info)       |
do_unregister_framebuffer(fb_info) |
put_fb_info(fb_info)               |
// free fbi->fb                    | set_ctrlr_state(fbi, state)
                                   | __pxafb_lcd_power(fbi, 0)
                                   | fbi->lcd_power(on, &fbi->fb.var)
                                   | //use fbi->fb

Fix it by ensuring that the work is canceled before proceeding
with the cleanup in pxafb_remove.

Note that only root user can remove the driver at runtime.

Signed-off-by: Kaixin Wang <kxwang23@m.fudan.edu.cn>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e657fa2df4429f3805a9b3e47fb1a4a1b02a72bd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:55 +05:30
Takashi Iwai
4ed9e05726 ALSA: hdsp: Break infinite MIDI input flush loop
[ Upstream commit c01f3815453e2d5f699ccd8c8c1f93a5b8669e59 ]

The current MIDI input flush on HDSP and HDSPM drivers relies on the
hardware reporting the right value.  If the hardware doesn't give the
proper value but returns -1, it may be stuck at an infinite loop.

Add a counter and break if the loop is unexpectedly too long.

Link: https://patch.msgid.link/20240808091513.31380-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit dc0c68e2e6e2c544b1361baa1ca230569ab6279d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:54 +05:30
Takashi Iwai
63e19c51d4 ALSA: asihpi: Fix potential OOB array access
[ Upstream commit 7b986c7430a6bb68d523dac7bfc74cbd5b44ef96 ]

ASIHPI driver stores some values in the static array upon a response
from the driver, and its index depends on the firmware.  We shouldn't
trust it blindly.

This patch adds a sanity check of the array index to fit in the array
size.

Link: https://patch.msgid.link/20240808091454.30846-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a6bdb691cf7b66dcd929de1a253c5c42edd2e522)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:52 +05:30
Thomas Gleixner
4225d03b83 signal: Replace BUG_ON()s
[ Upstream commit 7f8af7bac5380f2d95a63a6f19964e22437166e1 ]

These really can be handled gracefully without killing the machine.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 0f9c27fbb8a52c50ff7d2659386f1f43e7fbddee)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:51 +05:30
Gustavo A. R. Silva
be64d82131 wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
[ Upstream commit 498365e52bebcbc36a93279fe7e9d6aec8479cee ]

Replace one-element array with a flexible-array member in
`struct host_cmd_ds_802_11_scan_ext`.

With this, fix the following warning:

elo 16 17:51:58 surfacebook kernel: ------------[ cut here ]------------
elo 16 17:51:58 surfacebook kernel: memcpy: detected field-spanning write (size 243) of single field "ext_scan->tlv_buffer" at drivers/net/wireless/marvell/mwifiex/scan.c:2239 (size 1)
elo 16 17:51:58 surfacebook kernel: WARNING: CPU: 0 PID: 498 at drivers/net/wireless/marvell/mwifiex/scan.c:2239 mwifiex_cmd_802_11_scan_ext+0x83/0x90 [mwifiex]

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/linux-hardening/ZsZNgfnEwOcPdCly@black.fi.intel.com/
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/ZsZa5xRcsLq9D+RX@elsanto
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b55c8848fdc81514ec047b2a0ec782ffe9ab5323)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:48 +05:30
Aleksandrs Vinarskis
b1de585b30 ACPICA: iasl: handle empty connection_node
[ Upstream commit a0a2459b79414584af6c46dd8c6f866d8f1aa421 ]

ACPICA commit 6c551e2c9487067d4b085333e7fe97e965a11625

Link: https://github.com/acpica/acpica/commit/6c551e2c
Signed-off-by: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ea69502703bd3c38c3f016f8b6614ef0de2b94c2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:47 +05:30
Ido Schimmel
41cb1a361e ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family
[ Upstream commit 8fed54758cd248cd311a2b5c1e180abef1866237 ]

The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
lookup according to user provided parameters and communicate the result
back to user space.

However, unlike other users of the FIB lookup API, the upper DSCP bits
and the ECN bits of the DS field are not masked, which can result in the
wrong result being returned.

Solve this by masking the upper DSCP bits and the ECN bits using
IPTOS_RT_MASK.

The structure that communicates the request and the response is not
exported to user space, so it is unlikely that this netlink family is
actually in use [1].

[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 05905659e2591368b50eaa79d94c75aeb18c46ef)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:46 +05:30
Kuniyuki Iwashima
be1c66714c ipv4: Check !in_dev earlier for ioctl(SIOCSIFADDR).
[ Upstream commit e3af3d3c5b26c33a7950e34e137584f6056c4319 ]

dev->ip_ptr could be NULL if we set an invalid MTU.

Even then, if we issue ioctl(SIOCSIFADDR) for a new IPv4 address,
devinet_ioctl() allocates struct in_ifaddr and fails later in
inet_set_ifa() because in_dev is NULL.

Let's move the check earlier.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240809235406.50187-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 098a9b686df8c560f5f7683a1a388646aae0f023)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:45 +05:30
Simon Horman
e16865760f tipc: guard against string buffer overrun
[ Upstream commit 6555a2a9212be6983d2319d65276484f7c5f431a ]

Smatch reports that copying media_name and if_name to name_parts may
overwrite the destination.

 .../bearer.c:166 bearer_name_validate() error: strcpy() 'media_name' too large for 'name_parts->media_name' (32 vs 16)
 .../bearer.c:167 bearer_name_validate() error: strcpy() 'if_name' too large for 'name_parts->if_name' (1010102 vs 16)

This does seem to be the case so guard against this possibility by using
strscpy() and failing if truncation occurs.

Introduced by commit b97bf3fd8f ("[TIPC] Initial merge")

Compile tested only.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240801-tipic-overrun-v2-1-c5b869d1f074@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8298b6e45fb4d8944f356b08e4ea3e54df5e0488)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:44 +05:30
Pei Xiao
f4de22fe10 ACPICA: check null return of ACPI_ALLOCATE_ZEROED() in acpi_db_convert_to_package()
[ Upstream commit a5242874488eba2b9062985bf13743c029821330 ]

ACPICA commit 4d4547cf13cca820ff7e0f859ba83e1a610b9fd0

ACPI_ALLOCATE_ZEROED() may fail, elements might be NULL and will cause
NULL pointer dereference later.

Link: https://github.com/acpica/acpica/commit/4d4547cf
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Link: https://patch.msgid.link/tencent_4A21A2865B8B0A0D12CAEBEB84708EDDB505@qq.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4669da66ebc5b09881487f30669b0fcdb462188e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:43 +05:30
Rafael J. Wysocki
912e6b49d4 ACPI: EC: Do not release locks during operation region accesses
[ Upstream commit dc171114926ec390ab90f46534545420ec03e458 ]

It is not particularly useful to release locks (the EC mutex and the
ACPI global lock, if present) and re-acquire them immediately thereafter
during EC address space accesses in acpi_ec_space_handler().

First, releasing them for a while before grabbing them again does not
really help anyone because there may not be enough time for another
thread to acquire them.

Second, if another thread successfully acquires them and carries out
a new EC write or read in the middle if an operation region access in
progress, it may confuse the EC firmware, especially after the burst
mode has been enabled.

Finally, manipulating the locks after writing or reading every single
byte of data is overhead that it is better to avoid.

Accordingly, modify the code to carry out EC address space accesses
entirely without releasing the locks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patch.msgid.link/12473338.O9o76ZdvQC@rjwysocki.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8d5dd2d2ef6cc87799b4ff915e561814d3c35d2c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:39 +05:30
Armin Wolf
0495b656ee ACPICA: Fix memory leak if acpi_ps_get_next_field() fails
[ Upstream commit e6169a8ffee8a012badd8c703716e761ce851b15 ]

ACPICA commit 1280045754264841b119a5ede96cd005bc09b5a7

If acpi_ps_get_next_field() fails, the previously created field list
needs to be properly disposed before returning the status code.

Link: https://github.com/acpica/acpica/commit/12800457
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
[ rjw: Rename local variable to avoid compiler confusion ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 40fa60e0bf406ced3dfd857015dafdcd677a4929)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:15 +05:30
Armin Wolf
74c1a9c14f ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails
[ Upstream commit 5accb265f7a1b23e52b0ec42313d1e12895552f4 ]

ACPICA commit 2802af722bbde7bf1a7ac68df68e179e2555d361

If acpi_ps_get_next_namepath() fails, the previously allocated
union acpi_parse_object needs to be freed before returning the
status code.

The issue was first being reported on the Linux ACPI mailing list:

Link: https://lore.kernel.org/linux-acpi/56f94776-484f-48c0-8855-dba8e6a7793b@yandex.ru/T/
Link: https://github.com/acpica/acpica/commit/2802af72
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b017675cfbd126954d3b45afbdd6ee345a0ce368)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:13 +05:30
Krzysztof Kozlowski
62d7863689 net: hisilicon: hns_mdio: fix OF node leak in probe()
[ Upstream commit e62beddc45f487b9969821fad3a0913d9bc18a2f ]

Driver is leaking OF node reference from
of_parse_phandle_with_fixed_args() in probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240827144421.52852-4-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 963174dad7d4993ff3a4e1b43cefd296df0296b4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:12 +05:30
Krzysztof Kozlowski
3cd0286bac net: hisilicon: hns_dsaf_mac: fix OF node leak in hns_mac_get_info()
[ Upstream commit 5680cf8d34e1552df987e2f4bb1bff0b2a8c8b11 ]

Driver is leaking OF node reference from
of_parse_phandle_with_fixed_args() in hns_mac_get_info().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240827144421.52852-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7df217a21b74e730db216984218bde434dffc34b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:11 +05:30
Krzysztof Kozlowski
5852389ede net: hisilicon: hip04: fix OF node leak in probe()
[ Upstream commit 17555297dbd5bccc93a01516117547e26a61caf1 ]

Driver is leaking OF node reference from
of_parse_phandle_with_fixed_args() in probe().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240827144421.52852-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8c354ddfec8126ef58cdcde82dccc5cbb2c34e45)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:10 +05:30
Toke Høiland-Jørgensen
8de3a9a48a wifi: ath9k_htc: Use __skb_set_length() for resetting urb before resubmit
[ Upstream commit 94745807f3ebd379f23865e6dab196f220664179 ]

Syzbot points out that skb_trim() has a sanity check on the existing length of
the skb, which can be uninitialised in some error paths. The intent here is
clearly just to reset the length to zero before resubmitting, so switch to
calling __skb_set_length(skb, 0) directly. In addition, __skb_set_length()
already contains a call to skb_reset_tail_pointer(), so remove the redundant
call.

The syzbot report came from ath9k_hif_usb_reg_in_cb(), but there's a similar
usage of skb_trim() in ath9k_hif_usb_rx_cb(), change both while we're at it.

Reported-by: syzbot+98afa303be379af6cdb2@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://patch.msgid.link/20240812142447.12328-1-toke@toke.dk
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e6b9bf32e0695e4f374674002de0527d2a6768eb)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:09 +05:30
Dmitry Kandybka
f7b064d80a wifi: ath9k: fix possible integer overflow in ath9k_get_et_stats()
[ Upstream commit 3f66f26703093886db81f0610b97a6794511917c ]

In 'ath9k_get_et_stats()', promote TX stats counters to 'u64'
to avoid possible integer overflow. Compile tested only.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Dmitry Kandybka <d.kandybka@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://patch.msgid.link/20240725111743.14422-1-d.kandybka@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 600f668453be81b25dcc2f20096eac2243aebdaa)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:08 +05:30
Jann Horn
0d8b733aa6 f2fs: Require FMODE_WRITE for atomic write ioctls
commit 4f5a100f87f32cb65d4bb1ad282a08c92f6f591e upstream.

The F2FS ioctls for starting and committing atomic writes check for
inode_owner_or_capable(), but this does not give LSMs like SELinux or
Landlock an opportunity to deny the write access - if the caller's FSUID
matches the inode's UID, inode_owner_or_capable() immediately returns true.

There are scenarios where LSMs want to deny a process the ability to write
particular files, even files that the FSUID of the process owns; but this
can currently partially be bypassed using atomic write ioctls in two ways:

 - F2FS_IOC_START_ATOMIC_REPLACE + F2FS_IOC_COMMIT_ATOMIC_WRITE can
   truncate an inode to size 0
 - F2FS_IOC_START_ATOMIC_WRITE + F2FS_IOC_ABORT_ATOMIC_WRITE can revert
   changes another process concurrently made to a file

Fix it by requiring FMODE_WRITE for these operations, just like for
F2FS_IOC_MOVE_RANGE. Since any legitimate caller should only be using these
ioctls when intending to write into the file, that seems unlikely to break
anything.

Fixes: 88b88a6679 ("f2fs: support atomic writes")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 700f3a7c7fa5764c9f24bbf7c78e0b6e479fa653)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:07 +05:30
Takashi Iwai
74fe49cbff ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin
[ Upstream commit b3ebb007060f89d5a45c9b99f06a55e36a1945b5 ]

We received a regression report for System76 Pangolin (pang14) due to
the recent fix for Tuxedo Sirius devices to support the top speaker.
The reason was the conflicting PCI SSID, as often seen.

As a workaround, now the codec SSID is checked and the quirk is
applied conditionally only to Sirius devices.

Fixes: 4178d78cd7a8 ("ALSA: hda/conexant: Add pincfg quirk to enable top speakers on Sirius devices")
Reported-by: Christian Heusel <christian@heusel.eu>
Reported-by: Jerry <jerryluo225@gmail.com>
Closes: https://lore.kernel.org/c930b6a6-64e5-498f-b65a-1cd5e0a1d733@heusel.eu
Link: https://patch.msgid.link/20241004082602.29016-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ba4ec41f6958bd5fc314b98c0ba17f5bb9a11375)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:31:03 +05:30
Takashi Iwai
0b8e878ad9 ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs
[ Upstream commit 1c801e7f77445bc56e5e1fec6191fd4503534787 ]

Some time ago, we introduced the obey_preferred_dacs flag for choosing
the DAC/pin pairs specified by the driver instead of parsing the
paths.  This works as expected, per se, but there have been a few
cases where we forgot to set this flag while preferred_dacs table is
already set up.  It ended up with incorrect wiring and made us
wondering why it doesn't work.

Basically, when the preferred_dacs table is provided, it means that
the driver really wants to wire up to follow that.  That is, the
presence of the preferred_dacs table itself is already a "do-it"
flag.

In this patch, we simply replace the evaluation of obey_preferred_dacs
flag with the presence of preferred_dacs table for fixing the
misbehavior.  Another patch to drop of the obsoleted flag will
follow.

Fixes: 242d990c158d ("ALSA: hda/generic: Add option to enforce preferred_dacs pairs")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1219803
Link: https://patch.msgid.link/20241001121439.26060-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a66828fdf8ba3ccb30204f7e44761007a7437a3a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:38 +05:30
Aleksander Jan Bajkowski
031da3e6ff net: ethernet: lantiq_etop: fix memory disclosure
[ Upstream commit 45c0de18ff2dc9af01236380404bbd6a46502c69 ]

When applying padding, the buffer is not zeroed, which results in memory
disclosure. The mentioned data is observed on the wire. This patch uses
skb_put_padto() to pad Ethernet frames properly. The mentioned function
zeroes the expanded buffer.

In case the packet cannot be padded it is silently dropped. Statistics
are also not incremented. This driver does not support statistics in the
old 32-bit format or the new 64-bit format. These will be added in the
future. In its current form, the patch should be easily backported to
stable versions.

Ethernet MACs on Amazon-SE and Danube cannot do padding of the packets
in hardware, so software padding must be applied.

Fixes: 504d4721ee ("MIPS: Lantiq: Add ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240923214949.231511-2-olek2@wp.pl
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 905f06a34f960676e7dc77bea00f2f8fe18177ad)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:37 +05:30
Prashant Malani
5323335834 r8152: Factor out OOB link list waits
[ Upstream commit 5f71c84038d39def573744a145c573758f52a949 ]

The same for-loop check for the LINK_LIST_READY bit of an OOB_CTRL
register is used in several places. Factor these out into a single
function to reduce the lines of code.

Change-Id: I20e8f327045a72acc0a83e2d145ae2993ab62915
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Reviewed-by: Grant Grundler <grundler@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 45c0de18ff2d ("net: ethernet: lantiq_etop: fix memory disclosure")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e8bed7c8845878f8c60e76f0a10d61ea2f709580)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:35 +05:30
Eric Dumazet
0a67acb0ac netfilter: nf_tables: prevent nf_skb_duplicated corruption
[ Upstream commit 92ceba94de6fb4cee2bf40b485979c342f44a492 ]

syzbot found that nf_dup_ipv4() or nf_dup_ipv6() could write
per-cpu variable nf_skb_duplicated in an unsafe way [1].

Disabling preemption as hinted by the splat is not enough,
we have to disable soft interrupts as well.

[1]
BUG: using __this_cpu_write() in preemptible [00000000] code: syz.4.282/6316
 caller is nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
CPU: 0 UID: 0 PID: 6316 Comm: syz.4.282 Not tainted 6.11.0-rc7-syzkaller-00104-g7052622fccb1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Call Trace:
 <TASK>
  __dump_stack lib/dump_stack.c:93 [inline]
  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
  check_preemption_disabled+0x10e/0x120 lib/smp_processor_id.c:49
  nf_dup_ipv4+0x651/0x8f0 net/ipv4/netfilter/nf_dup_ipv4.c:87
  nft_dup_ipv4_eval+0x1db/0x300 net/ipv4/netfilter/nft_dup_ipv4.c:30
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_ipv4+0x202/0x320 net/netfilter/nft_chain_filter.c:23
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
  nf_hook+0x2c4/0x450 include/linux/netfilter.h:269
  NF_HOOK_COND include/linux/netfilter.h:302 [inline]
  ip_output+0x185/0x230 net/ipv4/ip_output.c:433
  ip_local_out net/ipv4/ip_output.c:129 [inline]
  ip_send_skb+0x74/0x100 net/ipv4/ip_output.c:1495
  udp_send_skb+0xacf/0x1650 net/ipv4/udp.c:981
  udp_sendmsg+0x1c21/0x2a60 net/ipv4/udp.c:1269
  sock_sendmsg_nosec net/socket.c:730 [inline]
  __sock_sendmsg+0x1a6/0x270 net/socket.c:745
  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597
  ___sys_sendmsg net/socket.c:2651 [inline]
  __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737
  __do_sys_sendmmsg net/socket.c:2766 [inline]
  __se_sys_sendmmsg net/socket.c:2763 [inline]
  __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4ce4f7def9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4ce5d4a038 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f4ce5135f80 RCX: 00007f4ce4f7def9
RDX: 0000000000000001 RSI: 0000000020005d40 RDI: 0000000000000006
RBP: 00007f4ce4ff0b76 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f4ce5135f80 R15: 00007ffd4cbc6d68
 </TASK>

Fixes: d877f07112 ("netfilter: nf_tables: add nft_dup expression")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 50067d8b3f48e4cd4c9e817d3e9a5b5ff3507ca7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:34 +05:30
Xiubo Li
b43f24e553 ceph: remove the incorrect Fw reference check when dirtying pages
[ Upstream commit c08dfb1b49492c09cf13838c71897493ea3b424e ]

When doing the direct-io reads it will also try to mark pages dirty,
but for the read path it won't hold the Fw caps and there is case
will it get the Fw reference.

Fixes: 5dda377cf0 ("ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR reference")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c26c5ec832dd9e9dcd0a0a892a485c99889b68f0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:33 +05:30
Stefan Wahren
23bb83ac39 mailbox: bcm2835: Fix timeout during suspend mode
[ Upstream commit dc09f007caed3b2f6a3b6bd7e13777557ae22bfd ]

During noirq suspend phase the Raspberry Pi power driver suffer of
firmware property timeouts. The reason is that the IRQ of the underlying
BCM2835 mailbox is disabled and rpi_firmware_property_list() will always
run into a timeout [1].

Since the VideoCore side isn't consider as a wakeup source, set the
IRQF_NO_SUSPEND flag for the mailbox IRQ in order to keep it enabled
during suspend-resume cycle.

[1]
PM: late suspend of devices complete after 1.754 msecs
WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128
 rpi_firmware_property_list+0x204/0x22c
Firmware transaction 0x00028001 timeout
Modules linked in:
CPU: 0 PID: 438 Comm: bash Tainted: G         C         6.9.3-dirty #17
Hardware name: BCM2835
Call trace:
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x34/0x44
dump_stack_lvl from __warn+0x88/0xec
__warn from warn_slowpath_fmt+0x7c/0xb0
warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c
rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c
rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0
rpi_firmware_set_power from _genpd_power_off+0xe4/0x148
_genpd_power_off from genpd_sync_power_off+0x7c/0x11c
genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0
genpd_finish_suspend from dpm_run_callback+0x78/0xd0
dpm_run_callback from device_suspend_noirq+0xc0/0x238
device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168
dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac
suspend_devices_and_enter from pm_suspend+0x254/0x2e4
pm_suspend from state_store+0xa8/0xd4
state_store from kernfs_fop_write_iter+0x154/0x1a0
kernfs_fop_write_iter from vfs_write+0x12c/0x184
vfs_write from ksys_write+0x78/0xc0
ksys_write from ret_fast_syscall+0x0/0x54
Exception stack(0xcc93dfa8 to 0xcc93dff0)
[...]
PM: noirq suspend of devices complete after 3095.584 msecs

Link: https://github.com/raspberrypi/firmware/issues/1894
Fixes: 0bae6af6d7 ("mailbox: Enable BCM2835 mailbox support")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4e1e03760ee7cc4779b6306867fe0fc02921b963)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:32 +05:30
Liao Chen
53a500c812 mailbox: rockchip: fix a typo in module autoloading
[ Upstream commit e92d87c9c5d769e4cb1dd7c90faa38dddd7e52e3 ]

MODULE_DEVICE_TABLE(of, rockchip_mbox_of_match) could let the module
properly autoloaded based on the alias from of_device_id table. It
should be 'rockchip_mbox_of_match' instead of 'rockchp_mbox_of_match',
just fix it.

Fixes: f70ed3b5dc ("mailbox: rockchip: Add Rockchip mailbox driver")
Signed-off-by: Liao Chen <liaochen4@huawei.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ae2d6fdd49669f35ed3a1156a4aab66a37e6a450)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:31 +05:30
Harshit Mogalapalli
23f7ca1c3c usb: yurex: Fix inconsistent locking bug in yurex_read()
commit e7d3b9f28654dbfce7e09f8028210489adaf6a33 upstream.

Unlock before returning on the error path.

Fixes: 86b20af11e84 ("usb: yurex: Replace snprintf() with the safer scnprintf() variant")
Reported-by: Dan Carpenter <error27@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202312170252.3udgrIcP-lkp@intel.com/
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20231219063639.450994-1-harshit.m.mogalapalli@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 709b0b70011b577bc78406e76c4563e10579ddad)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:29 +05:30
Andy Shevchenko
72b17eb620 i2c: isch: Add missed 'else'
commit 1db4da55070d6a2754efeb3743f5312fc32f5961 upstream.

In accordance with the existing comment and code analysis
it is quite likely that there is a missed 'else' when adapter
times out. Add it.

Fixes: 5bc1200852 ("i2c: Add Intel SCH SMBus support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.27+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bbe3396e96a2ee857cf2206784f06bc3f49ff240)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:28 +05:30
Tommy Huang
f5d486ef08 i2c: aspeed: Update the stop sw state when the bus recovery occurs
commit 93701d3b84ac5f3ea07259d4ced405c53d757985 upstream.

When the i2c bus recovery occurs, driver will send i2c stop command
in the scl low condition. In this case the sw state will still keep
original situation. Under multi-master usage, i2c bus recovery will
be called when i2c transfer timeout occurs. Update the stop command
calling with aspeed_i2c_do_stop function to update master_state.

Fixes: f327c686d3 ("i2c: aspeed: added driver for Aspeed I2C")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Tommy Huang <tommy_huang@aspeedtech.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 16cfd59341f73157ef319c588e639fc1013d94cf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:26 +05:30
Oliver Neukum
570c1a5f81 USB: misc: yurex: fix race between read and write
[ Upstream commit 93907620b308609c72ba4b95b09a6aa2658bb553 ]

The write code path touches the bbu member in a non atomic manner
without taking the spinlock. Fix it.

The bug is as old as the driver.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240912132126.1034743-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1250cd9dee69ace62b9eb87230e8274b48bc9460)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:25 +05:30
Lee Jones
a8736e2e1b usb: yurex: Replace snprintf() with the safer scnprintf() variant
[ Upstream commit 86b20af11e84c26ae3fde4dcc4f490948e3f8035 ]

There is a general misunderstanding amongst engineers that {v}snprintf()
returns the length of the data *actually* encoded into the destination
array.  However, as per the C99 standard {v}snprintf() really returns
the length of the data that *would have been* written if there were
enough space for it.  This misunderstanding has led to buffer-overruns
in the past.  It's generally considered safer to use the {v}scnprintf()
variants in their place (or even sprintf() in simple cases).  So let's
do that.

Whilst we're at it, let's define some magic numbers to increase
readability and ease of maintenance.

Link: https://lwn.net/Articles/69419/
Link: https://github.com/KSPP/linux/issues/105
Cc: Tomoki Sekiyama <tomoki.sekiyama@gmail.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20231213164246.1021885-9-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 93907620b308 ("USB: misc: yurex: fix race between read and write")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a2ac6cb8aaa2eb23209ffa641962dd62958522a1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:24 +05:30
Krzysztof Kozlowski
706b45f574 soc: versatile: realview: fix soc_dev leak during device remove
[ Upstream commit c774f2564c0086c23f5269fd4691f233756bf075 ]

If device is unbound, the soc_dev should be unregistered to prevent
memory leak.

Fixes: a2974c9c1f ("soc: add driver for the ARM RealView")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/20240825-soc-dev-fixes-v1-3-ff4b35abed83@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b05605f5a42b4719918486e2624e44f3fa9e818f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:22 +05:30
Krzysztof Kozlowski
98d4b39eeb soc: versatile: realview: fix memory leak during device remove
[ Upstream commit 1c4f26a41f9d052f334f6ae629e01f598ed93508 ]

If device is unbound, the memory allocated for soc_dev_attr should be
freed to prevent leaks.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/20240825-soc-dev-fixes-v1-2-ff4b35abed83@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Stable-dep-of: c774f2564c00 ("soc: versatile: realview: fix soc_dev leak during device remove")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 0accfec683c0a3e31c8ba738be0b0014e316d6a0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:21 +05:30
Sean Anderson
d06e77da0d PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler
[ Upstream commit 0199d2f2bd8cd97b310f7ed82a067247d7456029 ]

MSGF_LEG_MASK is laid out with INTA in bit 0, INTB in bit 1, INTC in bit 2,
and INTD in bit 3. Hardware IRQ numbers start at 0, and we register
PCI_NUM_INTX IRQs. So to enable INTA (aka hwirq 0) we should set bit 0.
Remove the subtraction of one.

This bug would cause INTx interrupts not to be delivered, as enabling INTB
would actually enable INTA, and enabling INTA wouldn't enable anything at
all. It is likely that this got overlooked for so long since most PCIe
hardware uses MSIs. This fixes the following UBSAN error:

  UBSAN: shift-out-of-bounds in ../drivers/pci/controller/pcie-xilinx-nwl.c:389:11
  shift exponent 18446744073709551615 is too large for 32-bit type 'int'
  CPU: 1 PID: 61 Comm: kworker/u10:1 Not tainted 6.6.20+ #268
  Hardware name: xlnx,zynqmp (DT)
  Workqueue: events_unbound deferred_probe_work_func
  Call trace:
  dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
  show_stack (arch/arm64/kernel/stacktrace.c:242)
  dump_stack_lvl (lib/dump_stack.c:107)
  dump_stack (lib/dump_stack.c:114)
  __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:387)
  nwl_unmask_leg_irq (drivers/pci/controller/pcie-xilinx-nwl.c:389 (discriminator 1))
  irq_enable (kernel/irq/internals.h:234 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345)
  __irq_startup (kernel/irq/internals.h:239 kernel/irq/chip.c:180 kernel/irq/chip.c:250)
  irq_startup (kernel/irq/chip.c:270)
  __setup_irq (kernel/irq/manage.c:1800)
  request_threaded_irq (kernel/irq/manage.c:2206)
  pcie_pme_probe (include/linux/interrupt.h:168 drivers/pci/pcie/pme.c:348)

Fixes: 9a181e1093 ("PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts")
Link: https://lore.kernel.org/r/20240531161337.864994-3-sean.anderson@linux.dev
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ebf6629fcff1e04e43ef75bd2c2dbfb410a95870)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:20 +05:30
Thomas Gleixner
750bab392f PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()
[ Upstream commit e56427068a8d796bb7b8e297f2b6e947380e383f ]

Going through a full irq descriptor lookup instead of just using the proper
helper function which provides direct access is suboptimal.

In fact it _is_ wrong because the chip callback needs to get the chip data
which is relevant for the chip while using the irq descriptor variant
returns the irq chip data of the top level chip of a hierarchy. It does not
matter in this case because the chip is the top level chip, but that
doesn't make it more correct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20201210194044.364211860@linutronix.de
Stable-dep-of: 0199d2f2bd8c ("PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d957766954641b4bbd7e359d51206c0b940988a6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:19 +05:30
Li Lingfeng
541e31a85c nfs: fix memory leak in error path of nfs4_do_reclaim
commit 8f6a7c9467eaf39da4c14e5474e46190ab3fb529 upstream.

Commit c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in
nfs4_do_reclaim()") separate out the freeing of the state owners from
nfs4_purge_state_owners() and finish it outside the rcu lock.
However, the error path is omitted. As a result, the state owners in
"freeme" will not be released.
Fix it by adding freeing in the error path.

Fixes: c77e22834ae9 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f239240d65807113e565226b8e0a7ea13390bff3)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:18 +05:30
Julian Sun
d04989a693 vfs: fix race between evice_inodes() and find_inode()&iput()
commit 88b1afbf0f6b221f6c5bb66cc80cd3b38d696687 upstream.

Hi, all

Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.

Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().

cpu0:                              cpu1:
iput() // i_count is 1
  ->spin_lock(inode)
  ->dec i_count to 0
  ->iput_final()                    generic_shutdown_super()
    ->__inode_add_lru()               ->evict_inodes()
      // cause some reason[2]           ->if (atomic_read(inode->i_count)) continue;
      // return before                  // inode 261 passed the above check
      // list_lru_add_obj()             // and then schedule out
   ->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set

btrfs_iget()
  // after some function calls
  ->find_inode()
    // found the above inode 261
    ->spin_lock(inode)
   // check I_FREEING|I_WILL_FREE
   // and passed
      ->__iget()
    ->spin_unlock(inode)                // schedule back
                                        ->spin_lock(inode)
                                        // check (I_NEW|I_FREEING|I_WILL_FREE) flags,
                                        // passed and set I_FREEING
iput()                                  ->spin_unlock(inode)
  ->spin_lock(inode)			  ->evict()
  // dec i_count to 0
  ->iput_final()
    ->spin_unlock()
    ->evict()

Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().

To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.

If there is any misunderstanding, please let me know, thanks.

[1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.

Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad
CC: stable@vger.kernel.org
Fixes: 63997e98a3 ("split invalidate_inodes()")
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Link: https://lore.kernel.org/r/20240823130730.658881-1-sunjunchao2870@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6cc13a80a26e6b48f78c725c01b91987d61563ef)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:16 +05:30
Nikita Zhandarovich
e549c1db61 f2fs: avoid potential int overflow in sanity_check_area_boundary()
commit 50438dbc483ca6a133d2bce9d5d6747bcee38371 upstream.

While calculating the end addresses of main area and segment 0, u32
may be not enough to hold the result without the danger of int
overflow.

Just in case, play it safe and cast one of the operands to a
wider type (u64).

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: fd694733d5 ("f2fs: cover large section in sanity check of super")
Cc: stable@vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 24dfe070d6d05d62a00c41d5d52af5a448ae7af7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:13 +05:30
Nikita Zhandarovich
3ae7b756c6 f2fs: prevent possible int overflow in dir_block_index()
commit 47f268f33dff4a5e31541a990dc09f116f80e61c upstream.

The result of multiplication between values derived from functions
dir_buckets() and bucket_blocks() *could* technically reach
2^30 * 2^2 = 2^32.

While unlikely to happen, it is prudent to ensure that it will not
lead to integer overflow. Thus, use mul_u32_u32() as it's more
appropriate to mitigate the issue.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 3843154598 ("f2fs: introduce large directory support")
Cc: stable@vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 60bffc6e6b32fb88e5c1234448de5ccf88b590f5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:11 +05:30
Thomas Weißschuh
50a3258b3a ACPI: sysfs: validate return type of _STR method
commit 4bb1e7d027413835b086aed35bc3f0713bc0f72b upstream.

Only buffer objects are valid return values of _STR.

If something else is returned description_show() will access invalid
memory.

Fixes: d1efe3c324 ("ACPI: Add new sysfs interface to export device description")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20240709-acpi-sysfs-groups-v2-1-058ab0667fa8@weissschuh.net
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 92fd5209fc014405f63a7db79802ca4b01dc0c05)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:10 +05:30
Mikhail Lobanov
a4b3607bf5 drbd: Add NULL check for net_conf to prevent dereference in state validation
commit a5e61b50c9f44c5edb6e134ede6fee8806ffafa9 upstream.

If the net_conf pointer is NULL and the code attempts to access its
fields without a check, it will lead to a null pointer dereference.
Add a NULL check before dereferencing the pointer.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 44ed167da7 ("drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru>
Link: https://lore.kernel.org/r/20240909133740.84297-1-m.lobanov@rosalinux.ru
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3b3ed68f695ee000e9c9fa536761a0554bfc1340)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:05 +05:30
Qiu-ji Chen
c0adcfc51e drbd: Fix atomicity violation in drbd_uuid_set_bm()
commit 2f02b5af3a4482b216e6a466edecf6ba8450fa45 upstream.

The violation of atomicity occurs when the drbd_uuid_set_bm function is
executed simultaneously with modifying the value of
device->ldev->md.uuid[UI_BITMAP]. Consider a scenario where, while
device->ldev->md.uuid[UI_BITMAP] passes the validity check when its
value is not zero, the value of device->ldev->md.uuid[UI_BITMAP] is
written to zero. In this case, the check in drbd_uuid_set_bm might refer
to the old value of device->ldev->md.uuid[UI_BITMAP] (before locking),
which allows an invalid value to pass the validity check, resulting in
inconsistency.

To address this issue, it is recommended to include the data validity
check within the locked section of the function. This modification
ensures that the value of device->ldev->md.uuid[UI_BITMAP] does not
change during the validation process, thereby maintaining its integrity.

This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency
bugs including data races and atomicity violations.

Fixes: 9f2247bb9b ("drbd: Protect accesses to the uuid set with a spinlock")
Cc: stable@vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Link: https://lore.kernel.org/r/20240913083504.10549-1-chenqiuji666@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b674f1b49f9eaec9aac5c64a75e535aa3f359af7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:04 +05:30
Florian Fainelli
7ab201e455 tty: rp2: Fix reset with non forgiving PCIe host bridges
commit f16dd10ba342c429b1e36ada545fb36d4d1f0e63 upstream.

The write to RP2_GLOBAL_CMD followed by an immediate read of
RP2_GLOBAL_CMD in rp2_reset_asic() is intented to flush out the write,
however by then the device is already in reset and cannot respond to a
memory cycle access.

On platforms such as the Raspberry Pi 4 and others using the
pcie-brcmstb.c driver, any memory access to a device that cannot respond
is met with a fatal system error, rather than being substituted with all
1s as is usually the case on PC platforms.

Swapping the delay and the read ensures that the device has finished
resetting before we attempt to read from it.

Fixes: 7d9f49afa4 ("serial: rp2: New driver for Comtrol RocketPort 2 cards")
Cc: stable <stable@kernel.org>
Suggested-by: Jim Quinlan <james.quinlan@broadcom.com>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20240906225435.707837-1-florian.fainelli@broadcom.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 279994e23d7e6d2a30f2cc7b7437fedccac0834d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:02 +05:30
Oliver Neukum
e0d8f2563c USB: misc: cypress_cy7c63: check for short transfer
commit 49cd2f4d747eeb3050b76245a7f72aa99dbd3310 upstream.

As we process the second byte of a control transfer, transfers
of less than 2 bytes must be discarded.

This bug is as old as the driver.

SIgned-off-by: Oliver Neukum <oneukum@suse.com>
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240912125449.1030536-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 638810fe9c0c15ffaa1b4129e54f1e8affb28afd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:01 +05:30
Oliver Neukum
41518e1e76 USB: appledisplay: close race between probe and completion handler
commit 8265d06b7794493d82c5c21a12d7ba43eccc30cb upstream.

There is a small window during probing when IO is running
but the backlight is not registered. Processing events
during that time will crash. The completion handler
needs to check for a backlight before scheduling work.

The bug is as old as the driver.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240912123317.1026049-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 17720dd1be72e4cf5436883cf9d114d0c3e47d19)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:30:00 +05:30
Krzysztof Kozlowski
7d2b7523bb soc: versatile: integrator: fix OF node leak in probe() error path
commit 874c5b601856adbfda10846b9770a6c66c41e229 upstream.

Driver is leaking OF node reference obtained from
of_find_matching_node().

Fixes: f956a785a2 ("soc: move SoC driver for the ARM Integrator")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/20240825-soc-dev-fixes-v1-1-ff4b35abed83@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6ab18d4ada166d38046ca8eb9598a3f1fdabd2b7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:58 +05:30
Laurent Pinchart
c7f3f1beaf Remove *.orig pattern from .gitignore
commit 76be4f5a784533c71afbbb1b8f2963ef9e2ee258 upstream.

Commit 3f1b0e1f28 (".gitignore update") added *.orig and *.rej
patterns to .gitignore in v2.6.23. The commit message didn't give a
rationale. Later on, commit 1f5d3a6b65 ("Remove *.rej pattern from
.gitignore") removed the *.rej pattern in v2.6.26, on the rationale that
*.rej files indicated something went really wrong and should not be
ignored.

The *.rej files are now shown by `git status`, which helps located
conflicts when applying patches and lowers the probability that they
will go unnoticed. It is however still easy to overlook the *.orig files
which slowly polute the source tree. That's not as big of a deal as not
noticing a conflict, but it's still not nice.

Drop the *.orig pattern from .gitignore to avoid this and help keep the
source tree clean.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
[masahiroy@kernel.org:
I do not have a strong opinion about this. Perhaps some people may have
a different opinion.

If you are someone who wants to ignore *.orig, it is likely you would
want to do so across all projects. Then, $XDG_CONFIG_HOME/git/ignore
would be more suitable for your needs. gitignore(5) suggests, "Patterns
which a user wants Git to ignore in all situations generally go into a
file specified by core.excludesFile in the user's ~/.gitconfig".

Please note that you cannot do the opposite; if *.orig is ignored by
the project's .gitignore, you cannot override the decision because
$XDG_CONFIG_HOME/git/ignore has a lower priority.

If *.orig is sitting on the fence, I'd leave it to the users. ]
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e19774a171f108433e9fba98a7bfbf65ec2a18de)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:57 +05:30
Hailey Mothershead
c6addb32f6 crypto: aead,cipher - zeroize key buffer after use
commit 23e4099bdc3c8381992f9eb975c79196d6755210 upstream.

I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding
cryptographic information should be zeroized once they are no longer
needed. Accomplish this by using kfree_sensitive for buffers that
previously held the private key.

Signed-off-by: Hailey Mothershead <hailmo@amazon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 89b9b6fa4463daf820e6a5ef65c3b0c2db239513)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:56 +05:30
Simon Horman
0e02142b9c netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
[ Upstream commit e1f1ee0e9ad8cbe660f5c104e791c5f1a7cf4c31 ]

Only provide ctnetlink_label_size when it is used,
which is when CONFIG_NF_CONNTRACK_EVENTS is configured.

Flagged by clang-18 W=1 builds as:

.../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
  385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
      |                   ^~~~~~~~~~~~~~~~~~~~

The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
this patch guards compilation of non-trivial implementations
of ctnetlink_dump_labels() and ctnetlink_label_size().

However, this is not necessary as each of these functions
will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
as each function starts with the equivalent of:

	struct nf_conn_labels *labels = nf_ct_labels_find(ct);

	if (!labels)
		return 0;

And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
is not enabled.  So I believe that the compiler optimises the code away
in such cases anyway.

Found by inspection.
Compile tested only.

Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
added Fixes: tag.

Fixes: 0ceabd8387 ("netfilter: ctnetlink: deliver labels to userspace")
Link: https://lore.kernel.org/netfilter-devel/20240909151712.GZ2097826@kernel.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b14c58e37050703568ab498404018294807209a5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:55 +05:30
Eric Dumazet
f4b47b313e netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
[ Upstream commit 9c778fe48d20ef362047e3376dee56d77f8500d4 ]

syzbot reported that nf_reject_ip6_tcphdr_put() was possibly sending
garbage on the four reserved tcp bits (th->res1)

Use skb_put_zero() to clear the whole TCP header,
as done in nf_reject_ip_tcphdr_put()

BUG: KMSAN: uninit-value in nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
  nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
  nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
  nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
  __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
  __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
  process_backlog+0x4ad/0xa50 net/core/dev.c:6108
  __napi_poll+0xe7/0x980 net/core/dev.c:6772
  napi_poll net/core/dev.c:6841 [inline]
  net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
  handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
  __do_softirq+0x14/0x1a kernel/softirq.c:588
  do_softirq+0x9a/0x100 kernel/softirq.c:455
  __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
  local_bh_enable include/linux/bottom_half.h:33 [inline]
  rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
  __dev_queue_xmit+0x2692/0x5610 net/core/dev.c:4450
  dev_queue_xmit include/linux/netdevice.h:3105 [inline]
  neigh_resolve_output+0x9ca/0xae0 net/core/neighbour.c:1565
  neigh_output include/net/neighbour.h:542 [inline]
  ip6_finish_output2+0x2347/0x2ba0 net/ipv6/ip6_output.c:141
  __ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
  ip6_finish_output+0xbb8/0x14b0 net/ipv6/ip6_output.c:226
  NF_HOOK_COND include/linux/netfilter.h:303 [inline]
  ip6_output+0x356/0x620 net/ipv6/ip6_output.c:247
  dst_output include/net/dst.h:450 [inline]
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip6_xmit+0x1ba6/0x25d0 net/ipv6/ip6_output.c:366
  inet6_csk_xmit+0x442/0x530 net/ipv6/inet6_connection_sock.c:135
  __tcp_transmit_skb+0x3b07/0x4880 net/ipv4/tcp_output.c:1466
  tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
  tcp_connect+0x35b6/0x7130 net/ipv4/tcp_output.c:4143
  tcp_v6_connect+0x1bcc/0x1e40 net/ipv6/tcp_ipv6.c:333
  __inet_stream_connect+0x2ef/0x1730 net/ipv4/af_inet.c:679
  inet_stream_connect+0x6a/0xd0 net/ipv4/af_inet.c:750
  __sys_connect_file net/socket.c:2061 [inline]
  __sys_connect+0x606/0x690 net/socket.c:2078
  __do_sys_connect net/socket.c:2088 [inline]
  __se_sys_connect net/socket.c:2085 [inline]
  __x64_sys_connect+0x91/0xe0 net/socket.c:2085
  x64_sys_call+0x27a5/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:43
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was stored to memory at:
  nf_reject_ip6_tcphdr_put+0x60c/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:249
  nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
  nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
  __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
  __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
  process_backlog+0x4ad/0xa50 net/core/dev.c:6108
  __napi_poll+0xe7/0x980 net/core/dev.c:6772
  napi_poll net/core/dev.c:6841 [inline]
  net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
  handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
  __do_softirq+0x14/0x1a kernel/softirq.c:588

Uninit was stored to memory at:
  nf_reject_ip6_tcphdr_put+0x2ca/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:231
  nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
  nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
  __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
  __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
  process_backlog+0x4ad/0xa50 net/core/dev.c:6108
  __napi_poll+0xe7/0x980 net/core/dev.c:6772
  napi_poll net/core/dev.c:6841 [inline]
  net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
  handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
  __do_softirq+0x14/0x1a kernel/softirq.c:588

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:3998 [inline]
  slab_alloc_node mm/slub.c:4041 [inline]
  kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4084
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583
  __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674
  alloc_skb include/linux/skbuff.h:1320 [inline]
  nf_send_reset6+0x98d/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:327
  nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
  __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
  __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
  process_backlog+0x4ad/0xa50 net/core/dev.c:6108
  __napi_poll+0xe7/0x980 net/core/dev.c:6772
  napi_poll net/core/dev.c:6841 [inline]
  net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
  handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
  __do_softirq+0x14/0x1a kernel/softirq.c:588

Fixes: c8d7b98bec ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20240913170615.3670897-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 872eca64c3267dbc5836b715716fc6c03a18eda7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:05 +05:30
Guoqing Jiang
8b7c02903c nfsd: call cache_put if xdr_reserve_space returns NULL
[ Upstream commit d078cbf5c38de83bc31f83c47dcd2184c04a50c7 ]

If not enough buffer space available, but idmap_lookup has triggered
lookup_fn which calls cache_get and returns successfully. Then we
missed to call cache_put here which pairs with cache_get.

Fixes: ddd1ea5636 ("nfsd4: use xdr_reserve_space in attribute encoding")
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviwed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3e8081ebff12bec1347deaceb6bce0765cce54df)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:03 +05:30
Jinjie Ruan
5d3643d432 ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
[ Upstream commit e229897d373a87ee09ec5cc4ecd4bb2f895fc16b ]

The debugfs_create_dir() function returns error pointers.
It never returns NULL. So use IS_ERR() to check it.

Fixes: e26a5843f7 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 20cbc281033ef5324f67f2d54bc539968f937255)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:02 +05:30
Mikhail Lobanov
a4ccccbef9 RDMA/cxgb4: Added NULL check for lookup_atid
[ Upstream commit e766e6a92410ca269161de059fff0843b8ddd65f ]

The lookup_atid() function can return NULL if the ATID is
invalid or does not exist in the identifier table, which
could lead to dereferencing a null pointer without a
check in the `act_establish()` and `act_open_rpl()` functions.
Add a NULL check to prevent null pointer dereferencing.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Mikhail Lobanov <m.lobanov@rosalinux.ru>
Link: https://patch.msgid.link/20240912145844.77516-1-m.lobanov@rosalinux.ru
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b12e25d91c7f97958341538c7dc63ee49d01548f)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:01 +05:30
Wang Jianzheng
2062f7ba72 pinctrl: mvebu: Fix devinit_dove_pinctrl_probe function
[ Upstream commit c25478419f6fd3f74c324a21ec007cf14f2688d7 ]

When an error occurs during the execution of the function
__devinit_dove_pinctrl_probe, the clk is not properly disabled.

Fix this by calling clk_disable_unprepare before return.

Fixes: ba607b6238 ("pinctrl: mvebu: make pdma clock on dove mandatory")
Signed-off-by: Wang Jianzheng <wangjianzheng@vivo.com>
Link: https://lore.kernel.org/20240829064823.19808-1-wangjianzheng@vivo.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 856d3ea97be0dfa5d7369e071c06c9259acfff33)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:29:00 +05:30
David Lechner
481e8a2a16 clk: ti: dra7-atl: Fix leak of of_nodes
[ Upstream commit 9d6e9f10e2e031fb7bfb3030a7d1afc561a28fea ]

This fix leaking the of_node references in of_dra7_atl_clk_probe().

The docs for of_parse_phandle_with_args() say that the caller must call
of_node_put() on the returned node. This adds the missing of_node_put()
to fix the leak.

Fixes: 9ac33b0ce8 ("CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)")
Signed-off-by: David Lechner <dlechner@baylibre.com>
Link: https://lore.kernel.org/r/20240826-clk-fix-leak-v1-1-f55418a13aa6@baylibre.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d6b680af89ca0bf498d105265bc32061979e87f1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:28:59 +05:30
Yang Yingliang
4c678d9222 pinctrl: single: fix missing error code in pcs_probe()
[ Upstream commit cacd8cf79d7823b07619865e994a7916fcc8ae91 ]

If pinctrl_enable() fails in pcs_probe(), it should return the error code.

Fixes: 8f773bfbdd42 ("pinctrl: single: fix possible memory leak when pinctrl_enable() fails")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/20240819024625.154441-1-yangyingliang@huaweicloud.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4f227c4dc81187fcca9c858b070b9d3f586c9b30)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:28:54 +05:30
Sean Anderson
09cab1b6b3 PCI: xilinx-nwl: Fix register misspelling
[ Upstream commit a437027ae1730b8dc379c75fa0dd7d3036917400 ]

MSIC -> MISC

Fixes: c2a7ff18ed ("PCI: xilinx-nwl: Expand error logging")
Link: https://lore.kernel.org/r/20240531161337.864994-4-sean.anderson@linux.dev
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 43b361ca2c977e593319c8248e549c0863ab1730)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:35 +05:30
Junlin Li
9f561736d2 drivers: media: dvb-frontends/rtl2830: fix an out-of-bounds write error
[ Upstream commit 46d7ebfe6a75a454a5fa28604f0ef1491f9d8d14 ]

Ensure index in rtl2830_pid_filter does not exceed 31 to prevent
out-of-bounds access.

dev->filters is a 32-bit value, so set_bit and clear_bit functions should
only operate on indices from 0 to 31. If index is 32, it will attempt to
access a non-existent 33rd bit, leading to out-of-bounds access.
Change the boundary check from index > 32 to index >= 32 to resolve this
issue.

Fixes: df70ddad81 ("[media] rtl2830: implement PID filter")
Signed-off-by: Junlin Li <make24@iscas.ac.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 8ffbe7d07b8e76193b151107878ddc1ccc94deb5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:33 +05:30
Junlin Li
33caae161a drivers: media: dvb-frontends/rtl2832: fix an out-of-bounds write error
[ Upstream commit 8ae06f360cfaca2b88b98ca89144548b3186aab1 ]

Ensure index in rtl2832_pid_filter does not exceed 31 to prevent
out-of-bounds access.

dev->filters is a 32-bit value, so set_bit and clear_bit functions should
only operate on indices from 0 to 31. If index is 32, it will attempt to
access a non-existent 33rd bit, leading to out-of-bounds access.
Change the boundary check from index > 32 to index >= 32 to resolve this
issue.

Signed-off-by: Junlin Li <make24@iscas.ac.cn>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 4b01e01a81 ("[media] rtl2832: implement PID filter")
[hverkuil: added fixes tag, rtl2830_pid_filter -> rtl2832_pid_filter in logmsg]
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7065c05c6d58b9b9a98127aa14e9a5ec68173918)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:32 +05:30
Jonas Karlman
9d738a1f8d clk: rockchip: Set parent rate for DCLK_VOP clock on RK3228
[ Upstream commit 1d34b9757523c1ad547bd6d040381f62d74a3189 ]

Similar to DCLK_LCDC on RK3328, the DCLK_VOP on RK3228 is typically
parented by the hdmiphy clk and it is expected that the DCLK_VOP and
hdmiphy clk rate are kept in sync.

Use CLK_SET_RATE_PARENT and CLK_SET_RATE_NO_REPARENT flags, same as used
on RK3328, to make full use of all possible supported display modes.

Fixes: 0a9d4ac08e ("clk: rockchip: set the clock ids for RK3228 VOP")
Fixes: 307a2e9ac5 ("clk: rockchip: add clock controller for rk3228")
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Link: https://lore.kernel.org/r/20240615170417.3134517-3-jonas@kwiboo.se
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7b9e7a258b9f4d68a9425c67bfee1e1e926d1960)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:31 +05:30
Ian Rogers
c47e1d2000 perf time-utils: Fix 32-bit nsec parsing
[ Upstream commit 38e2648a81204c9fc5b4c87a8ffce93a6ed91b65 ]

The "time utils" test fails in 32-bit builds:
  ...
  parse_nsec_time("18446744073.709551615")
  Failed. ptime 4294967295709551615 expected 18446744073709551615
  ...

Switch strtoul to strtoull as an unsigned long in 32-bit build isn't
64-bits.

Fixes: c284d669a2 ("perf tools: Move parse_nsec_time to time-utils.c")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Link: https://lore.kernel.org/r/20240831070415.506194-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c062eebe3b3d98ae2ef61fe8008f2c12bfa31249)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:30 +05:30
Yang Jihong
a98ef2f2ae perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time
[ Upstream commit 39c243411bdb8fb35777adf49ee32549633c4e12 ]

If sched_in event for current task is not recorded, sched_in timestamp
will be set to end_time of time window interest, causing an error in
timestamp show. In this case, we choose to ignore this event.

Test scenario:

  perf[1229608] does not record the first sched_in event, run time and sch delay are both 0

  # perf sched timehist
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
   2090450.763231 [0000]  perf[1229608]                       0.000      0.000      0.000
   2090450.763235 [0000]  migration/0[15]                     0.000      0.001      0.003
   2090450.763263 [0001]  perf[1229608]                       0.000      0.000      0.000
   2090450.763268 [0001]  migration/1[21]                     0.000      0.001      0.004
   2090450.763302 [0002]  perf[1229608]                       0.000      0.000      0.000
   2090450.763309 [0002]  migration/2[27]                     0.000      0.001      0.007
   2090450.763338 [0003]  perf[1229608]                       0.000      0.000      0.000
   2090450.763343 [0003]  migration/3[33]                     0.000      0.001      0.004

Before:

  arbitrarily specify a time window of interest, timestamp will be set to an incorrect value

  # perf sched timehist --time 100,200
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
       200.000000 [0000]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0001]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0002]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0003]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0004]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0005]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0006]  perf[1229608]                       0.000      0.000      0.000
       200.000000 [0007]  perf[1229608]                       0.000      0.000      0.000

 After:

  # perf sched timehist --time 100,200
  Samples of sched_switch event do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------

Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240819024720.2405244-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d825de712b59dfd6e256c0ecad7443da652c2b22)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:29 +05:30
Yang Jihong
ccaef2b90f perf sched timehist: Fix missing free of session in perf_sched__timehist()
[ Upstream commit 6bdf5168b6fb19541b0c1862bdaa596d116c7bfb ]

When perf_time__parse_str() fails in perf_sched__timehist(),
need to free session that was previously created, fix it.

Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Signed-off-by: Yang Jihong <yangjihong@bytedance.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240806023533.1316348-1-yangjihong@bytedance.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1d4d7e56c4aa834f359a29aa64f5f5c01e3453eb)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:28 +05:30
Ryusuke Konishi
6f66447d1a nilfs2: fix potential oob read in nilfs_btree_check_delete()
[ Upstream commit f9c96351aa6718b42a9f42eaf7adce0356bdb5e8 ]

The function nilfs_btree_check_delete(), which checks whether degeneration
to direct mapping occurs before deleting a b-tree entry, causes memory
access outside the block buffer when retrieving the maximum key if the
root node has no entries.

This does not usually happen because b-tree mappings with 0 child nodes
are never created by mkfs.nilfs2 or nilfs2 itself.  However, it can happen
if the b-tree root node read from a device is configured that way, so fix
this potential issue by adding a check for that case.

Link: https://lkml.kernel.org/r/20240904081401.16682-4-konishi.ryusuke@gmail.com
Fixes: 17c76b0104 ("nilfs2: B-tree based block mapping")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f3a9859767c7aea758976f5523903d247e585129)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:26 +05:30
Ryusuke Konishi
5cfe7a8f9a nilfs2: determine empty node blocks as corrupted
[ Upstream commit 111b812d3662f3a1b831d19208f83aa711583fe6 ]

Due to the nature of b-trees, nilfs2 itself and admin tools such as
mkfs.nilfs2 will never create an intermediate b-tree node block with 0
child nodes, nor will they delete (key, pointer)-entries that would result
in such a state.  However, it is possible that a b-tree node block is
corrupted on the backing device and is read with 0 child nodes.

Because operation is not guaranteed if the number of child nodes is 0 for
intermediate node blocks other than the root node, modify
nilfs_btree_node_broken(), which performs sanity checks when reading a
b-tree node block, so that such cases will be judged as metadata
corruption.

Link: https://lkml.kernel.org/r/20240904081401.16682-3-konishi.ryusuke@gmail.com
Fixes: 17c76b0104 ("nilfs2: B-tree based block mapping")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6d7f4fac707a187882b8c610e8889c097b289082)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:25 +05:30
Ryusuke Konishi
65f2a244c3 nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
[ Upstream commit 9403001ad65ae4f4c5de368bdda3a0636b51d51a ]

Patch series "nilfs2: fix potential issues with empty b-tree nodes".

This series addresses three potential issues with empty b-tree nodes that
can occur with corrupted filesystem images, including one recently
discovered by syzbot.

This patch (of 3):

If a b-tree is broken on the device, and the b-tree height is greater than
2 (the level of the root node is greater than 1) even if the number of
child nodes of the b-tree root is 0, a NULL pointer dereference occurs in
nilfs_btree_prepare_insert(), which is called from nilfs_btree_insert().

This is because, when the number of child nodes of the b-tree root is 0,
nilfs_btree_do_lookup() does not set the block buffer head in any of
path[x].bp_bh, leaving it as the initial value of NULL, but if the level
of the b-tree root node is greater than 1, nilfs_btree_get_nonroot_node(),
which accesses the buffer memory of path[x].bp_bh, is called.

Fix this issue by adding a check to nilfs_btree_root_broken(), which
performs sanity checks when reading the root node from the device, to
detect this inconsistency.

Thanks to Lizhi Xu for trying to solve the bug and clarifying the cause
early on.

Link: https://lkml.kernel.org/r/20240904081401.16682-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240902084101.138971-1-lizhi.xu@windriver.com
Link: https://lkml.kernel.org/r/20240904081401.16682-2-konishi.ryusuke@gmail.com
Fixes: 17c76b0104 ("nilfs2: B-tree based block mapping")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+9bff4c7b992038a7409f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9bff4c7b992038a7409f
Cc: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2b78e9df10fb7f4e9d3d7a18417dd72fbbc1dfd0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:25:21 +05:30
Thadeu Lima de Souza Cascardo
25fb52c992 ext4: avoid OOB when system.data xattr changes underneath the filesystem
[ Upstream commit c6b72f5d82b1017bad80f9ebf502832fc321d796 ]

When looking up for an entry in an inlined directory, if e_value_offs is
changed underneath the filesystem by some change in the block device, it
will lead to an out-of-bounds access that KASAN detects as an UAF.

EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
loop0: detected capacity change from 2048 to 2047
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103

CPU: 0 UID: 0 PID: 5103 Comm: syz-executor269 Not tainted 6.11.0-rc4-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500
 ext4_find_inline_entry+0x4be/0x5e0 fs/ext4/inline.c:1697
 __ext4_find_entry+0x2b4/0x1b30 fs/ext4/namei.c:1573
 ext4_lookup_entry fs/ext4/namei.c:1727 [inline]
 ext4_lookup+0x15f/0x750 fs/ext4/namei.c:1795
 lookup_one_qstr_excl+0x11f/0x260 fs/namei.c:1633
 filename_create+0x297/0x540 fs/namei.c:3980
 do_symlinkat+0xf9/0x3a0 fs/namei.c:4587
 __do_sys_symlinkat fs/namei.c:4610 [inline]
 __se_sys_symlinkat fs/namei.c:4607 [inline]
 __x64_sys_symlinkat+0x95/0xb0 fs/namei.c:4607
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f3e73ced469
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff4d40c258 EFLAGS: 00000246 ORIG_RAX: 000000000000010a
RAX: ffffffffffffffda RBX: 0032656c69662f2e RCX: 00007f3e73ced469
RDX: 0000000020000200 RSI: 00000000ffffff9c RDI: 00000000200001c0
RBP: 0000000000000000 R08: 00007fff4d40c290 R09: 00007fff4d40c290
R10: 0023706f6f6c2f76 R11: 0000000000000246 R12: 00007fff4d40c27c
R13: 0000000000000003 R14: 431bde82d7b634db R15: 00007fff4d40c2b0
 </TASK>

Calling ext4_xattr_ibody_find right after reading the inode with
ext4_get_inode_loc will lead to a check of the validity of the xattrs,
avoiding this problem.

Reported-by: syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79
Fixes: e8e948e780 ("ext4: let ext4_find_entry handle inline data")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-5-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5b076d37e8d99918e9294bd6b35a8bbb436819b0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:24:23 +05:30
Thadeu Lima de Souza Cascardo
8226004cd2 ext4: return error on ext4_find_inline_entry
[ Upstream commit 4d231b91a944f3cab355fce65af5871fb5d7735b ]

In case of errors when reading an inode from disk or traversing inline
directory entries, return an error-encoded ERR_PTR instead of returning
NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns
such encoded errors.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: c6b72f5d82b1 ("ext4: avoid OOB when system.data xattr changes underneath the filesystem")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ce8f41fca0b6bc69753031afea8fc01f97b5e1af)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:24:03 +05:30
Kemeng Shi
fe0a5a2f8c ext4: avoid negative min_clusters in find_group_orlov()
[ Upstream commit bb0a12c3439b10d88412fd3102df5b9a6e3cd6dc ]

min_clusters is signed integer and will be converted to unsigned
integer when compared with unsigned number stats.free_clusters.
If min_clusters is negative, it will be converted to a huge unsigned
value in which case all groups may not meet the actual desired free
clusters.
Set negative min_clusters to 0 to avoid unexpected behavior.

Fixes: ac27a0ec11 ("[PATCH] ext4: initial copy of files from ext3")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://patch.msgid.link/20240820132234.2759926-4-shikemeng@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7b98a77cdad322fa3c7babf15c37659a94aa3593)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:24:01 +05:30
Jiawei Ye
35f5dbc7ac smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso
[ Upstream commit 2749749afa071f8a0e405605de9da615e771a7ce ]

In the `smk_set_cipso` function, the `skp->smk_netlabel.attr.mls.cat`
field is directly assigned to a new value without using the appropriate
RCU pointer assignment functions. According to RCU usage rules, this is
illegal and can lead to unpredictable behavior, including data
inconsistencies and impossible-to-diagnose memory corruption issues.

This possible bug was identified using a static analysis tool developed
by myself, specifically designed to detect RCU-related issues.

To address this, the assignment is now done using rcu_assign_pointer(),
which ensures that the pointer assignment is done safely, with the
necessary memory barriers and synchronization. This change prevents
potential RCU dereference issues by ensuring that the `cat` field is
safely updated while still adhering to RCU's requirements.

Fixes: 0817534ff9ea ("smackfs: Fix use-after-free in netlbl_catmap_walk()")
Signed-off-by: Jiawei Ye <jiawei.ye@foxmail.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 029ebd49aab06dd438c1256876730518aef7da35)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:57 +05:30
yangerkun
c3d3c854d9 ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard
[ Upstream commit 20cee68f5b44fdc2942d20f3172a262ec247b117 ]

Commit 3d56b8d2c7 ("ext4: Speed up FITRIM by recording flags in
ext4_group_info") speed up fstrim by skipping trim trimmed group. We
also has the chance to clear trimmed once there exists some block free
for this group(mount without discard), and the next trim for this group
will work well too.

For mount with discard, we will issue dicard when we free blocks, so
leave trimmed flag keep alive to skip useless trim trigger from
userspace seems reasonable. But for some case like ext4 build on
dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4
maybe unaligned for dm thinpool, and thinpool will just finish this
discard(see process_discard_bio when begein equals to end) without
actually process discard. For this case, trim from userspace can really
help us to free some thinpool block.

So convert to clear trimmed flag for all case no matter mounted with
discard or not.

Fixes: 3d56b8d2c7 ("ext4: Speed up FITRIM by recording flags in ext4_group_info")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6f44db60f9c42265e1e61596994f457f3c30d432)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:11 +05:30
Mauricio Faria de Oliveira
e830611b6a jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
[ Upstream commit aa3c0c61f62d682259e3e66cdc01846290f9cd6c ]

Export functions that implement the current behavior done
for an inode in journal_submit|finish_inode_data_buffers().

No functional change.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Stable-dep-of: 20cee68f5b44 ("ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 58a48155ce22e8e001308a41a16d8c89ee003b80)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:09 +05:30
Chen Yu
4050d4814e kthread: fix task state in kthread worker if being frozen
[ Upstream commit e16c7b07784f3fb03025939c4590b9a7c64970a7 ]

When analyzing a kernel waring message, Peter pointed out that there is a
race condition when the kworker is being frozen and falls into
try_to_freeze() with TASK_INTERRUPTIBLE, which could trigger a
might_sleep() warning in try_to_freeze().  Although the root cause is not
related to freeze()[1], it is still worthy to fix this issue ahead.

One possible race scenario:

        CPU 0                                           CPU 1
        -----                                           -----

        // kthread_worker_fn
        set_current_state(TASK_INTERRUPTIBLE);
                                                       suspend_freeze_processes()
                                                         freeze_processes
                                                           static_branch_inc(&freezer_active);
                                                         freeze_kernel_threads
                                                           pm_nosig_freezing = true;
        if (work) { //false
          __set_current_state(TASK_RUNNING);

        } else if (!freezing(current)) //false, been frozen

                      freezing():
                      if (static_branch_unlikely(&freezer_active))
                        if (pm_nosig_freezing)
                          return true;
          schedule()
	}

        // state is still TASK_INTERRUPTIBLE
        try_to_freeze()
          might_sleep() <--- warning

Fix this by explicitly set the TASK_RUNNING before entering
try_to_freeze().

Link: https://lore.kernel.org/lkml/Zs2ZoAcUsZMX2B%2FI@chenyu5-mobl2/ [1]
Link: https://lkml.kernel.org/r/20240827112308.181081-1-yu.c.chen@intel.com
Fixes: b56c0d8937 ("kthread: implement kthread_worker")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Gow <davidgow@google.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6430d6a00b0d8d3de663ecc0da248f8f3557b82e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:08 +05:30
Rob Clark
7ad30d1740 kthread: add kthread_work tracepoints
[ Upstream commit f630c7c6f10546ebff15c3a856e7949feb7a2372 ]

While migrating some code from wq to kthread_worker, I found that I missed
the execute_start/end tracepoints.  So add similar tracepoints for
kthread_work.  And for completeness, queue_work tracepoint (although this
one differs slightly from the matching workqueue tracepoint).

Link: https://lkml.kernel.org/r/20201010180323.126634-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Phil Auld <pauld@redhat.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vincent Donnefort <vincent.donnefort@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Ilias Stamatis <stamatis.iliass@gmail.com>
Cc: Liang Chen <cl@rock-chips.com>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "J. Bruce Fields" <bfields@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stable-dep-of: e16c7b07784f ("kthread: fix task state in kthread worker if being frozen")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 65c1957181a1e2cd5344e49d4e5b6e9f930092d1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:07 +05:30
Tony Ambardar
3d7157e04c selftests/bpf: Fix error compiling test_lru_map.c
[ Upstream commit cacf2a5a78cd1f5f616eae043ebc6f024104b721 ]

Although the post-increment in macro 'CPU_SET(next++, &cpuset)' seems safe,
the sequencing can raise compile errors, so move the increment outside the
macro. This avoids an error seen using gcc 12.3.0 for mips64el/musl-libc:

  In file included from test_lru_map.c:11:
  test_lru_map.c: In function 'sched_next_online':
  test_lru_map.c:129:29: error: operation on 'next' may be undefined [-Werror=sequence-point]
    129 |                 CPU_SET(next++, &cpuset);
        |                             ^
  cc1: all warnings being treated as errors

Fixes: 3fbfadce60 ("bpf: Fix test_lru_sanity5() in test_lru_map.c")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/22993dfb11ccf27925a626b32672fd3324cb76c4.1722244708.git.tony.ambardar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e5fa35e20078c3f08a249a15e616645a7e7068e2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:05 +05:30
Juergen Gross
412706d710 xen/swiotlb: add alignment check for dma buffers
[ Upstream commit 9f40ec84a7976d95c34e7cc070939deb103652b0 ]

When checking a memory buffer to be consecutive in machine memory,
the alignment needs to be checked, too. Failing to do so might result
in DMA memory not being aligned according to its requested size,
leading to error messages like:

  4xxx 0000:2b:00.0: enabling device (0140 -> 0142)
  4xxx 0000:2b:00.0: Ring address not aligned
  4xxx 0000:2b:00.0: Failed to initialise service qat_crypto
  4xxx 0000:2b:00.0: Resetting device qat_dev0
  4xxx: probe of 0000:2b:00.0 failed with error -14

Fixes: 9435cce879 ("xen/swiotlb: Add support for 64KB page granularity")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 66c845af6613a62f08d1425054526cc294842914)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:04 +05:30
Juergen Gross
bfc678ecc8 xen/swiotlb: simplify range_straddles_page_boundary()
[ Upstream commit bf70726668c6116aa4976e0cc87f470be6268a2f ]

range_straddles_page_boundary() is open coding several macros from
include/xen/page.h. Use those instead. Additionally there is no need
to have check_pages_physically_contiguous() as a separate function as
it is used only once, so merge it into range_straddles_page_boundary().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Stable-dep-of: 9f40ec84a797 ("xen/swiotlb: add alignment check for dma buffers")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5937434b2ca4884798571079cc71ad3a58b3c8fd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:04 +05:30
Juergen Gross
2fac1cb312 xen: use correct end address of kernel for conflict checking
[ Upstream commit fac1bceeeb04886fc2ee952672e6e6c85ce41dca ]

When running as a Xen PV dom0 the kernel is loaded by the hypervisor
using a different memory map than that of the host. In order to
minimize the required changes in the kernel, the kernel adapts its
memory map to that of the host. In order to do that it is checking
for conflicts of its load address with the host memory map.

Unfortunately the tested memory range does not include the .brk
area, which might result in crashes or memory corruption when this
area does conflict with the memory map of the host.

Fix the test by using the _end label instead of __bss_stop.

Fixes: 808fdb7193 ("xen: check for kernel memory conflicting with memory layout")

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f38d39918cff054f4bfc466cac1c110d735eda94)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:03 +05:30
Sherry Yang
23f377a2f9 drm/msm: fix %s null argument error
[ Upstream commit 25b85075150fe8adddb096db8a4b950353045ee1 ]

The following build error was triggered because of NULL string argument:

BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c: In function 'mdp5_smp_dump':
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]
BUILDSTDERR:   352 |                         drm_printf(p, "%s:%d\t%d\t%s\n",
BUILDSTDERR:       |                                                   ^~
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]

This happens from the commit a61ddb4393ad ("drm: enable (most) W=1
warnings by default across the subsystem"). Using "(null)" instead
to fix it.

Fixes: bc5289eed4 ("drm/msm/mdp5: add debugfs to show smp block status")
Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/611071/
Link: https://lore.kernel.org/r/20240827165337.1075904-1-sherry.yang@oracle.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b7a63d4bac70f660d63cba66684bc03f09be50ad)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:02 +05:30
Wolfram Sang
918dde4424 ipmi: docs: don't advertise deprecated sysfs entries
[ Upstream commit 64dce81f8c373c681e62d5ffe0397c45a35d48a2 ]

"i2c-adapter" class entries are deprecated since 2009. Switch to the
proper location.

Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Closes: https://lore.kernel.org/r/80c4a898-5867-4162-ac85-bdf7c7c68746@gmail.com
Fixes: 259307074b ("ipmi: Add SMBus interface driver (SSIF)")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Message-Id: <20240901090211.3797-2-wsa+renesas@sang-engineering.com>
Signed-off-by: Corey Minyard <corey@minyard.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e4e81788a8b83f267d25b9f3b68cb4837b71bdd9)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:23:01 +05:30
Jeongjun Park
30893dcebc jfs: fix out-of-bounds in dbNextAG() and diAlloc()
[ Upstream commit e63866a475562810500ea7f784099bfe341e761a ]

In dbNextAG() , there is no check for the case where bmp->db_numag is
greater or same than MAXAG due to a polluted image, which causes an
out-of-bounds. Therefore, a bounds check should be added in dbMount().

And in dbNextAG(), a check for the case where agpref is greater than
bmp->db_numag should be added, so an out-of-bounds exception should be
prevented.

Additionally, a check for the case where agno is greater or same than
MAXAG should be added in diAlloc() to prevent out-of-bounds.

Reported-by: Jeongjun Park <aha310510@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d1017d2a0f3f16dc1db5120e7ddbe7c6680425b0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:57 +05:30
Nikita Zhandarovich
b706497e33 drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets
[ Upstream commit 3fbaf475a5b8361ebee7da18964db809e37518b7 ]

Several cs track offsets (such as 'track->db_s_read_offset')
either are initialized with or plainly take big enough values that,
once shifted 8 bits left, may be hit with integer overflow if the
resulting values end up going over u32 limit.

Same goes for a few instances of 'surf.layer_size * mslice'
multiplications that are added to 'offset' variable - they may
potentially overflow as well and need to be validated properly.

While some debug prints in this code section take possible overflow
issues into account, simply casting to (unsigned long) may be
erroneous in its own way, as depending on CPU architecture one is
liable to get different results.

Fix said problems by:
 - casting 'offset' to fixed u64 data type instead of
 ambiguous unsigned long.
 - casting one of the operands in vulnerable to integer
 overflow cases to u64.
 - adjust format specifiers in debug prints to properly
 represent 'offset' values.

Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.

Fixes: 285484e2d5 ("drm/radeon: add support for evergreen/ni tiling informations v11")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ec7cf75b4e2b584e6f2b167ce998428b42522df6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:18 +05:30
Alex Bee
e8c36d3ca4 drm/rockchip: vop: Allow 4096px width scaling
[ Upstream commit 0ef968d91a20b5da581839f093f98f7a03a804f7 ]

There is no reason to limit VOP scaling to 3840px width, the limit of
RK3288, when there are newer VOP versions that support 4096px width.

Change to enforce a maximum of 4096px width plane scaling, the maximum
supported output width of the VOP versions supported by this driver.

Fixes: 4c156c21c7 ("drm/rockchip: vop: support plane scale")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Signed-off-by: Jonas Karlman <jonas@kwiboo.se>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240615170417.3134517-4-jonas@kwiboo.se
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6a512ab02cde62f147351d38ebefa250522336c4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:17 +05:30
Matteo Croce
583175e0fb drm/amd: fix typo
[ Upstream commit 229f7b1d6344ea35fff0b113e4d91128921f8937 ]

Fix spelling mistake: "lenght" -> "length"

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 8155566a26b8 ("drm/amdgpu: properly handle vbios fake edid sizing")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f4a502c468886ffc54e436279d7f573b4d02bd5b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:15 +05:30
Christophe JAILLET
eec93da06d fbdev: hpfb: Fix an error handling path in hpfb_dio_probe()
[ Upstream commit aa578e897520f32ae12bec487f2474357d01ca9c ]

If an error occurs after request_mem_region(), a corresponding
release_mem_region() should be called, as already done in the remove
function.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit da77622151181c1d7d8ce99019c14cd5bd6453b5)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:14 +05:30
Artur Weber
bfd3e9f4b8 power: supply: max17042_battery: Fix SOC threshold calc w/ no current sense
[ Upstream commit 3a3acf839b2cedf092bdd1ff65b0e9895df1656b ]

Commit 223a3b82834f ("power: supply: max17042_battery: use VFSOC for
capacity when no rsns") made it so that capacity on systems without
current sensing would be read from VFSOC instead of RepSOC. However,
the SOC threshold calculation still read RepSOC to get the SOC
regardless of the current sensing option state.

Fix this by applying the same conditional to determine which register
should be read.

This also seems to be the intended behavior as per the datasheet - SOC
alert config value in MiscCFG on setups without current sensing is set
to a value of 0b11, indicating SOC alerts being generated based on
VFSOC, instead of 0b00 which indicates SOC alerts being generated based
on RepSOC.

This fixes an issue on the Galaxy S3/Midas boards, where the alert
interrupt would be constantly retriggered, causing high CPU usage
on idle (around ~12%-15%).

Fixes: e5f3872d20 ("max17042: Add support for signalling change in SOC")
Signed-off-by: Artur Weber <aweber.kernel@gmail.com>
Reviewed-by: Henrik Grimler <henrik@grimler.se>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20240817-max17042-soc-threshold-fix-v1-1-72b45899c3cc@gmail.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f9e9ce0f2b420b63c29e96840865640098bbafe7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:12 +05:30
Yuntao Liu
1e0cb54c80 hwmon: (ntc_thermistor) fix module autoloading
[ Upstream commit b6964d66a07a9003868e428a956949e17ab44d7e ]

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
based on the alias from of_device_id table.

Fixes: 9e8269de10 ("hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver")
Signed-off-by: Yuntao Liu <liuyuntao12@huawei.com>
Message-ID: <20240815083021.756134-1-liuyuntao12@huawei.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6f91b0464947c4119682731401e11e095d8db06d)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:11 +05:30
Mirsad Todorovac
6cc6f059c2 mtd: slram: insert break after errors in parsing the map
[ Upstream commit 336c218dd7f0588ed8a7345f367975a00a4f003f ]

GCC 12.3.0 compiler on linux-next next-20240709 tree found the execution
path in which, due to lazy evaluation, devlength isn't initialised with the
parsed string:

   289		while (map) {
   290			devname = devstart = devlength = NULL;
   291
   292			if (!(devname = strsep(&map, ","))) {
   293				E("slram: No devicename specified.\n");
   294				break;
   295			}
   296			T("slram: devname = %s\n", devname);
   297			if ((!map) || (!(devstart = strsep(&map, ",")))) {
   298				E("slram: No devicestart specified.\n");
   299			}
   300			T("slram: devstart = %s\n", devstart);
 → 301			if ((!map) || (!(devlength = strsep(&map, ",")))) {
   302				E("slram: No devicelength / -end specified.\n");
   303			}
 → 304			T("slram: devlength = %s\n", devlength);
   305			if (parse_cmdline(devname, devstart, devlength) != 0) {
   306				return(-EINVAL);
   307			}

Parsing should be finished after map == NULL, so a break is best inserted after
each E("slram: ... \n") error message.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240711234319.637824-1-mtodorovac69@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6015f85fc8eba1ccf7db8b20a9518388fcb4fbf7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:10 +05:30
Guenter Roeck
e7aef58c01 hwmon: (max16065) Fix overflows seen when writing limits
[ Upstream commit 744ec4477b11c42e2c8de9eb8364675ae7a0bd81 ]

Writing large limits resulted in overflows as reported by module tests.

in0_lcrit: Suspected overflow: [max=5538, read 0, written 2147483647]
in0_crit: Suspected overflow: [max=5538, read 0, written 2147483647]
in0_min: Suspected overflow: [max=5538, read 0, written 2147483647]

Fix the problem by clamping prior to multiplications and the use of
DIV_ROUND_CLOSEST, and by using consistent variable types.

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Fixes: f5bae2642e ("hwmon: Driver for MAX16065 System Manager and compatibles")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b665734d4772df97eaeb4d943dc104dbd9ec1e9a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:08 +05:30
Ankit Agrawal
409bb147f5 clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
[ Upstream commit ca140a0dc0a18acd4653b56db211fec9b2339986 ]

Add the missing iounmap() when clock frequency fails to get read by the
of_property_read_u32() call, or if the call to msm_timer_init() fails.

Fixes: 6e3321631a ("ARM: msm: Add DT support to msm_timer")
Signed-off-by: Ankit Agrawal <agrawal.ag.ankit@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Link: https://lore.kernel.org/r/20240713095713.GA430091@bnew-VirtualBox
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 24d689791c6dbdb11b4b5208ed746f28fe651715)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:07 +05:30
Krzysztof Kozlowski
20bd919b37 reset: berlin: fix OF node leak in probe() error path
[ Upstream commit 5f58a88cc91075be38cec69b7cb70aaa4ba69e8b ]

Driver is leaking OF node reference on memory allocation failure.
Acquire the OF node reference after memory allocation to fix this and
keep it simple.

Fixes: aed6f3cadc ("reset: berlin: convert to a platform driver")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20240825-reset-cleanup-scoped-v1-1-03f6d834f8c0@linaro.org
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 041b763798bf460307db3bd8144e3732aef52902)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:06 +05:30
Krzysztof Kozlowski
10acc7e5ca ARM: versatile: fix OF node leak in CPUs prepare
[ Upstream commit f2642d97f2105ed17b2ece0c597450f2ff95d704 ]

Machine code is leaking OF node reference from of_find_matching_node()
in realview_smp_prepare_cpus().

Fixes: 5420b4b156 ("ARM: realview: add an DT SMP boot method")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://lore.kernel.org/20240826054934.10724-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 722d698f3e8de32a753ee1148b009406d0b3b829)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:05 +05:30
Andy Shevchenko
866d7d0fc3 spi: ppc4xx: Avoid returning 0 when failed to parse and map IRQ
[ Upstream commit 7781f1d120fec8624fc654eda900fc8748262082 ]

0 is incorrect error code when failed to parse and map IRQ.
Replace OF specific old API for IRQ retrieval with a generic
one to fix this issue.

Fixes: 0f245463b01e ("spi: ppc4xx: handle irq_of_parse_and_map() errors")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20240814144525.2648450-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e546902c4917656203e0e134630a873e9b6d28af)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:03 +05:30
Ma Ke
96a52fa20d spi: ppc4xx: handle irq_of_parse_and_map() errors
[ Upstream commit 0f245463b01ea254ae90e1d0389e90b0e7d8dc75 ]

Zero and negative number is not a valid IRQ for in-kernel code and the
irq_of_parse_and_map() function returns zero on error.  So this check for
valid IRQs should only accept values > 0.

Fixes: 44dab88e7c ("spi: add spi_ppc4xx driver")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Link: https://patch.msgid.link/20240724084047.1506084-1-make24@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f2a73a1f728e6fe765fc07c043a3d1670d854518)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:02 +05:30
Yu Kuai
aca46da5ee block, bfq: don't break merge chain in bfq_split_bfqq()
[ Upstream commit 42c306ed723321af4003b2a41bb73728cab54f85 ]

Consider the following scenario:

    Process 1       Process 2       Process 3       Process 4
     (BIC1)          (BIC2)          (BIC3)          (BIC4)
      Λ               |               |                |
       \-------------\ \-------------\ \--------------\|
                      V               V                V
      bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
ref    0              1               2                4

If Process 1 issue a new IO and bfqq2 is found, and then bfq_init_rq()
decide to spilt bfqq2 by bfq_split_bfqq(). Howerver, procress reference
of bfqq2 is 1 and bfq_split_bfqq() just clear the coop flag, which will
break the merge chain.

Expected result: caller will allocate a new bfqq for BIC1

    Process 1       Process 2       Process 3       Process 4
     (BIC1)          (BIC2)          (BIC3)          (BIC4)
                      |               |                |
                       \-------------\ \--------------\|
                                      V                V
      bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
ref    0              0               1                3

Since the condition is only used for the last bfqq4 when the previous
bfqq2 and bfqq3 are already splited. Fix the problem by checking if
bfqq is the last one in the merge chain as well.

Fixes: 36eca89483 ("block, bfq: add Early Queue Merge (EQM)")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240902130329.3787024-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9e813033594b141f61ff0ef0cfaaef292564b041)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:01 +05:30
Yu Kuai
09dff5aa01 block, bfq: fix possible UAF for bfqq->bic with merge chain
[ Upstream commit 18ad4df091dd5d067d2faa8fce1180b79f7041a7 ]

1) initial state, three tasks:

		Process 1       Process 2	Process 3
		 (BIC1)          (BIC2)		 (BIC3)
		  |  Λ            |  Λ		  |  Λ
		  |  |            |  |		  |  |
		  V  |            V  |		  V  |
		  bfqq1           bfqq2		  bfqq3
process ref:	   1		    1		    1

2) bfqq1 merged to bfqq2:

		Process 1       Process 2	Process 3
		 (BIC1)          (BIC2)		 (BIC3)
		  |               |		  |  Λ
		  \--------------\|		  |  |
		                  V		  V  |
		  bfqq1--------->bfqq2		  bfqq3
process ref:	   0		    2		    1

3) bfqq2 merged to bfqq3:

		Process 1       Process 2	Process 3
		 (BIC1)          (BIC2)		 (BIC3)
	 here -> Λ                |		  |
		  \--------------\ \-------------\|
		                  V		  V
		  bfqq1--------->bfqq2---------->bfqq3
process ref:	   0		    1		    3

In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then
get bfqq3 through merge chain, and finially handle IO by bfqq3.
Howerver, current code will think bfqq2 is owned by BIC1, like initial
state, and set bfqq2->bic to BIC1.

bfq_insert_request
-> by Process 1
 bfqq = bfq_init_rq(rq)
  bfqq = bfq_get_bfqq_handle_split
   bfqq = bic_to_bfqq
   -> get bfqq2 from BIC1
 bfqq->ref++
 rq->elv.priv[0] = bic
 rq->elv.priv[1] = bfqq
 if (bfqq_process_refs(bfqq) == 1)
  bfqq->bic = bic
  -> record BIC1 to bfqq2

  __bfq_insert_request
   new_bfqq = bfq_setup_cooperator
   -> get bfqq3 from bfqq2->new_bfqq
   bfqq_request_freed(bfqq)
   new_bfqq->ref++
   rq->elv.priv[1] = new_bfqq
   -> handle IO by bfqq3

Fix the problem by checking bfqq is from merge chain fist. And this
might fix a following problem reported by our syzkaller(unreproducible):

==================================================================
BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline]
BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline]
BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889
Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595

CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G             L     6.6.0-07439-gba2303cacfda #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_requeue_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:364 [inline]
 print_report+0x10d/0x610 mm/kasan/report.c:475
 kasan_report+0x8e/0xc0 mm/kasan/report.c:588
 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline]
 bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline]
 bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889
 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757
 bfq_init_rq block/bfq-iosched.c:6876 [inline]
 bfq_insert_request block/bfq-iosched.c:6254 [inline]
 bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304
 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593
 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502
 process_one_work kernel/workqueue.c:2627 [inline]
 process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
 kthread+0x33c/0x440 kernel/kthread.c:388
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305
 </TASK>

Allocated by task 20776:
 kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328
 kasan_slab_alloc include/linux/kasan.h:188 [inline]
 slab_post_alloc_hook mm/slab.h:763 [inline]
 slab_alloc_node mm/slub.c:3458 [inline]
 kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503
 ioc_create_icq block/blk-ioc.c:370 [inline]
 ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436
 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812
 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403
 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517
 blk_mq_get_new_requests block/blk-mq.c:2940 [inline]
 blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042
 __submit_bio+0x331/0x6f0 block/blk-core.c:624
 __submit_bio_noacct_mq block/blk-core.c:703 [inline]
 submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732
 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826
 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958
 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619
 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330
 process_one_work kernel/workqueue.c:2627 [inline]
 process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
 kthread+0x33c/0x440 kernel/kthread.c:388
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305

Freed by task 946:
 kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522
 ____kasan_slab_free mm/kasan/common.c:236 [inline]
 __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244
 kasan_slab_free include/linux/kasan.h:164 [inline]
 slab_free_hook mm/slub.c:1815 [inline]
 slab_free_freelist_hook mm/slub.c:1841 [inline]
 slab_free mm/slub.c:3786 [inline]
 kmem_cache_free+0x118/0x6f0 mm/slub.c:3808
 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189
 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462
 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553

Last potentially related work creation:
 kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492
 __call_rcu_common kernel/rcu/tree.c:2712 [inline]
 call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826
 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105
 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124
 process_one_work kernel/workqueue.c:2627 [inline]
 process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
 kthread+0x33c/0x440 kernel/kthread.c:388
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305

Second to last potentially related work creation:
 kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492
 __call_rcu_common kernel/rcu/tree.c:2712 [inline]
 call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826
 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105
 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124
 process_one_work kernel/workqueue.c:2627 [inline]
 process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
 kthread+0x33c/0x440 kernel/kthread.c:388
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305

The buggy address belongs to the object at ffff888123839d68
 which belongs to the cache bfq_io_cq of size 1360
The buggy address is located 336 bytes inside of
 freed 1360-byte region [ffff888123839d68, ffff88812383a2b8)

The buggy address belongs to the physical page:
page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838
head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: 0xffffffff()
raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488
raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                        ^
 ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Fixes: 36eca89483 ("block, bfq: add Early Queue Merge (EQM)")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a9bdd5b36887d2bacb8bc777fd18317c99fc2587)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:22:00 +05:30
Luiz Augusto von Dentz
89bc9eda33 Bluetooth: btusb: Fix not handling ZPL/short-transfer
[ Upstream commit 7b05933340f4490ef5b09e84d644d12484b05fdf ]

Requesting transfers of the exact same size of wMaxPacketSize may result
in ZPL/short-transfer since the USB stack cannot handle it as we are
limiting the buffer size to be the same as wMaxPacketSize.

Also, in terms of throughput this change has the same effect to
interrupt endpoint as 290ba20081 "Bluetooth: Improve USB driver throughput
by increasing the frame size" had for the bulk endpoint, so users of the
advertisement bearer (e.g. BT Mesh) may benefit from this change.

Fixes: 5e23b923da ("[Bluetooth] Add generic driver for Bluetooth USB devices")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2dfadca5439eca817fbb206c6003e5526d5e73df)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:21:59 +05:30
Kuniyuki Iwashima
4e50fe07c6 can: bcm: Clear bo->bcm_proc_read after remove_proc_entry().
[ Upstream commit 94b0818fa63555a65f6ba107080659ea6bcca63e ]

syzbot reported a warning in bcm_release(). [0]

The blamed change fixed another warning that is triggered when
connect() is issued again for a socket whose connect()ed device has
been unregistered.

However, if the socket is just close()d without the 2nd connect(), the
remaining bo->bcm_proc_read triggers unnecessary remove_proc_entry()
in bcm_release().

Let's clear bo->bcm_proc_read after remove_proc_entry() in bcm_notify().

[0]
name '4986'
WARNING: CPU: 0 PID: 5234 at fs/proc/generic.c:711 remove_proc_entry+0x2e7/0x5d0 fs/proc/generic.c:711
Modules linked in:
CPU: 0 UID: 0 PID: 5234 Comm: syz-executor606 Not tainted 6.11.0-rc5-syzkaller-00178-g5517ae241919 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
RIP: 0010:remove_proc_entry+0x2e7/0x5d0 fs/proc/generic.c:711
Code: ff eb 05 e8 cb 1e 5e ff 48 8b 5c 24 10 48 c7 c7 e0 f7 aa 8e e8 2a 38 8e 09 90 48 c7 c7 60 3a 1b 8c 48 89 de e8 da 42 20 ff 90 <0f> 0b 90 90 48 8b 44 24 18 48 c7 44 24 40 0e 36 e0 45 49 c7 04 07
RSP: 0018:ffffc9000345fa20 EFLAGS: 00010246
RAX: 2a2d0aee2eb64600 RBX: ffff888032f1f548 RCX: ffff888029431e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000345fb08 R08: ffffffff8155b2f2 R09: 1ffff1101710519a
R10: dffffc0000000000 R11: ffffed101710519b R12: ffff888011d38640
R13: 0000000000000004 R14: 0000000000000000 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880b8800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcfb52722f0 CR3: 000000000e734000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 bcm_release+0x250/0x880 net/can/bcm.c:1578
 __sock_release net/socket.c:659 [inline]
 sock_close+0xbc/0x240 net/socket.c:1421
 __fput+0x24a/0x8a0 fs/file_table.c:422
 task_work_run+0x24f/0x310 kernel/task_work.c:228
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0xa2f/0x27f0 kernel/exit.c:882
 do_group_exit+0x207/0x2c0 kernel/exit.c:1031
 __do_sys_exit_group kernel/exit.c:1042 [inline]
 __se_sys_exit_group kernel/exit.c:1040 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1040
 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fcfb51ee969
Code: Unable to access opcode bytes at 0x7fcfb51ee93f.
RSP: 002b:00007ffce0109ca8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fcfb51ee969
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
RBP: 00007fcfb526f3b0 R08: ffffffffffffffb8 R09: 0000555500000000
R10: 0000555500000000 R11: 0000000000000246 R12: 00007fcfb526f3b0
R13: 0000000000000000 R14: 00007fcfb5271ee0 R15: 00007fcfb51bf160
 </TASK>

Fixes: 76fe372ccb81 ("can: bcm: Remove proc entry when dev is unregistered.")
Reported-by: syzbot+0532ac7a06fb1a03187e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0532ac7a06fb1a03187e
Tested-by: syzbot+0532ac7a06fb1a03187e@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://patch.msgid.link/20240905012237.79683-1-kuniyu@amazon.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f5059fae5ed518fc56494ce5bdd4f5360de4b3bc)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:21:54 +05:30
Toke Høiland-Jørgensen
b7601eeaf9 wifi: ath9k: Remove error checks when creating debugfs entries
[ Upstream commit f6ffe7f0184792c2f99aca6ae5b916683973d7d3 ]

We should not be checking the return values from debugfs creation at all: the
debugfs functions are designed to handle errors of previously called functions
and just transparently abort the creation of debugfs entries when debugfs is
disabled. If we check the return value and abort driver initialisation, we break
the driver if debugfs is disabled (such as when booting with debugfs=off).

Earlier versions of ath9k accidentally did the right thing by checking the
return value, but only for NULL, not for IS_ERR(). This was "fixed" by the two
commits referenced below, breaking ath9k with debugfs=off starting from the 6.6
kernel (as reported in the Bugzilla linked below).

Restore functionality by just getting rid of the return value check entirely.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219122
Fixes: 1e4134610d93 ("wifi: ath9k: use IS_ERR() with debugfs_create_dir()")
Fixes: 6edb4ba6fb5b ("wifi: ath9k: fix parameter check in ath9k_init_debug()")
Reported-by: Daniel Tobias <dan.g.tob@gmail.com>
Tested-by: Daniel Tobias <dan.g.tob@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://patch.msgid.link/20240805110225.19690-1-toke@toke.dk
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 0c3bbcbce030ca203963c520191ad2c5d89bf862)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:21:18 +05:30
Minjie Du
7afee22b0b wifi: ath9k: fix parameter check in ath9k_init_debug()
[ Upstream commit 6edb4ba6fb5b946d112259f54f4657f82eb71e89 ]

Make IS_ERR() judge the debugfs_create_dir() function return
in ath9k_init_debug()

Signed-off-by: Minjie Du <duminjie@vivo.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20230712114740.13226-1-duminjie@vivo.com
Stable-dep-of: f6ffe7f01847 ("wifi: ath9k: Remove error checks when creating debugfs entries")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ac848aff235efdd903c0c185c1cb44496c5b9bb0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:20:35 +05:30
Junhao Xie
28005e076f USB: serial: pl2303: add device id for Macrosilicon MS3020
commit 7d47d22444bb7dc1b6d768904a22070ef35e1fc0 upstream.

Add the device id for the Macrosilicon MS3020 which is a
PL2303HXN based device.

Signed-off-by: Junhao Xie <bigfoot@classfun.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 79efd61e1c50d79d89a48e6c01761f8f890a83dd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:20:31 +05:30
Hagar Hemdan
3eb884fdbf gpio: prevent potential speculation leaks in gpio_device_get_desc()
commit d795848ecce24a75dfd46481aee066ae6fe39775 upstream.

Userspace may trigger a speculative read of an address outside the gpio
descriptor array.
Users can do that by calling gpio_ioctl() with an offset out of range.
Offset is copied from user and then used as an array index to get
the gpio descriptor without sanitization in gpio_device_get_desc().

This change ensures that the offset is sanitized by using
array_index_nospec() to mitigate any possibility of speculative
information leaks.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Hagar Hemdan <hagarhem@amazon.com>
Link: https://lore.kernel.org/r/20240523085332.1801-1-hagarhem@amazon.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Hugo SIMELIERE <hsimeliere.opensource@witekio.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 18504710442671b02d00e6db9804a0ad26c5a479)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:59 +05:30
Ferry Meng
63a82ca4f0 ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()
[ Upstream commit af77c4fc1871847b528d58b7fdafb4aa1f6a9262 ]

xattr in ocfs2 maybe 'non-indexed', which saved with additional space
requested.  It's better to check if the memory is out of bound before
memcmp, although this possibility mainly comes from crafted poisonous
images.

Link: https://lkml.kernel.org/r/20240520024024.1976129-2-joseph.qi@linux.alibaba.com
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e2b3d7a9d019d4d1a0da6c3ea64a1ff79c99c090)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:56 +05:30
Ferry Meng
70c1e59977 ocfs2: add bounds checking to ocfs2_xattr_find_entry()
[ Upstream commit 9e3041fecdc8f78a5900c3aa51d3d756e73264d6 ]

Add a paranoia check to make sure it doesn't stray beyond valid memory
region containing ocfs2 xattr entries when scanning for a match.  It will
prevent out-of-bound access in case of crafted images.

Link: https://lkml.kernel.org/r/20240520024024.1976129-1-joseph.qi@linux.alibaba.com
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: lei lu <llfamsec@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: af77c4fc1871 ("ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b49a786beb11ff740cb9e0c20b999c2a0e1729c2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:55 +05:30
Michael Kelley
a7cf28a9aa x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
[ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ]

A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if
available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux
doesn't unnecessarily do refined TSC calibration when setting up the TSC
clocksource.

With this change, a message such as this is no longer output during boot
when the TSC is used as the clocksource:

[    1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz

Furthermore, the guest and host will have exactly the same view of the
TSC frequency, which is important for features such as the TSC deadline
timer that are emulated by the Hyper-V host.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20240606025559.1631-1-mhklinux@outlook.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1da08d443212eba1f731b3f163c5b23ec1c882c1)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:51 +05:30
Liao Chen
dafc7e2287 spi: bcm63xx: Enable module autoloading
[ Upstream commit 709df70a20e990d262c473ad9899314039e8ec82 ]

Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based
on the alias from of_device_id table.

Signed-off-by: Liao Chen <liaochen4@huawei.com>
Link: https://patch.msgid.link/20240831094231.795024-1-liaochen4@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1cde0480b087bd8f4e12396fcbb133ee9d9876bd)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:48 +05:30
Jacky Chou
3acf81d045 net: ftgmac100: Ensure tx descriptor updates are visible
[ Upstream commit 4186c8d9e6af57bab0687b299df10ebd47534a0a ]

The driver must ensure TX descriptor updates are visible
before updating TX pointer and TX clear pointer.

This resolves TX hangs observed on AST2600 when running
iperf3.

Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 46974d97d58a2a91da16b032de0c78c4346bc1c2)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:44 +05:30
Mike Rapoport
978747a42e microblaze: don't treat zero reserved memory regions as error
[ Upstream commit 0075df288dd8a7abfe03b3766176c393063591dd ]

Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the
check for non-zero of memblock.reserved.cnt in mmu_init() would always
be true either because  memblock.reserved.cnt is initialized to 1 or
because there were memory reservations earlier.

The removal of dummy empty entry in memblock caused this check to fail
because now memblock.reserved.cnt is initialized to 0.

Remove the check for non-zero of memblock.reserved.cnt because it's
perfectly fine to have an empty memblock.reserved array that early in
boot.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a5bfdf7e4d956f3035779687eade8da23560f4bb)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:37 +05:30
Thomas Blocher
d926e4b95f pinctrl: at91: make it work with current gpiolib
[ Upstream commit 752f387faaae0ae2e84d3f496922524785e77d60 ]

pinctrl-at91 currently does not support the gpio-groups devicetree
property and has no pin-range.
Because of this at91 gpios stopped working since patch
commit 2ab73c6d8323fa1e ("gpio: Support GPIO controllers without pin-ranges")
This was discussed in the patches
commit fc328a7d1fcce263 ("gpio: Revert regression in sysfs-gpio (gpiolib.c)")
commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"")

As a workaround manually set pin-range via gpiochip_add_pin_range() until
a) pinctrl-at91 is reworked to support devicetree gpio-groups
b) another solution as mentioned in
commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"")
is found

Signed-off-by: Thomas Blocher <thomas.blocher@ek-dev.de>
Link: https://lore.kernel.org/5b992862-355d-f0de-cd3d-ff99e67a4ff1@ek-dev.de
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 33d615ee40f0651bb3d282a66e6f59eae6ea4ada)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:35 +05:30
Hongbo Li
cbe1fcc77b ASoC: allow module autoloading for table db1200_pids
[ Upstream commit 0e9fdab1e8df490354562187cdbb8dec643eae2c ]

Add MODULE_DEVICE_TABLE(), so modules could be properly
autoloaded based on the alias from platform_device_id table.

Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Link: https://patch.msgid.link/20240821061955.2273782-2-lihongbo22@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 71d74f78ae565a64eae3022020a9d4e82dace694)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:30 +05:30
Samasth Norway Ananda
215675dc59 selftests/kcmp: remove call to ksft_set_plan()
The function definition for ksft_set_plan() is not present in linux-4.19.y.
kcmp_test selftest fails to compile because of this.

Fixes: 32b0469d13eb ("selftests/kcmp: Make the test output consistent and clear")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1a136754b12424b99bf4e0bb13554d68605ac642)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 11:19:02 +05:30
Samasth Norway Ananda
e7e913cd6e selftests/vm: remove call to ksft_set_plan()
The function definition for ksft_set_plan() is not present in linux-4.19.y.
compaction_test selftest fails to compile because of this.

Fixes: 9a21701edc41 ("selftests/mm: conform test to TAP format output")
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Reviewed-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 26a7159fdc3683e90998339d5ca5e0ce231a6391)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:03:58 +05:30
Sean Anderson
3e191338f2 net: dpaa: Pad packets to ETH_ZLEN
[ Upstream commit cbd7ec083413c6a2e0c326d49e24ec7d12c7a9e0 ]

When sending packets under 60 bytes, up to three bytes of the buffer
following the data may be leaked. Avoid this by extending all packets to
ETH_ZLEN, ensuring nothing is leaked in the padding. This bug can be
reproduced by running

	$ ping -s 11 destination

Fixes: 9ad1a37493 ("dpaa_eth: add support for DPAA Ethernet")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910143144.1439910-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit cd5b9d657ecd44ad5f254c3fea3a6ab1cf0e2ef7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:02:27 +05:30
Jacky Chou
4d2fa13795 net: ftgmac100: Enable TX interrupt to avoid TX timeout
[ Upstream commit fef2843bb49f414d1523ca007d088071dee0e055 ]

Currently, the driver only enables RX interrupt to handle RX
packets and TX resources. Sometimes there is not RX traffic,
so the TX resource needs to wait for RX interrupt to free.
This situation will toggle the TX timeout watchdog when the MAC
TX ring has no more resources to transmit packets.
Therefore, enable TX interrupt to release TX resources at any time.

When I am verifying iperf3 over UDP, the network hangs.
Like the log below.

root# iperf3 -c 192.168.100.100 -i1 -t10 -u -b0
Connecting to host 192.168.100.100, port 5201
[  4] local 192.168.100.101 port 35773 connected to 192.168.100.100 port 5201
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-20.42  sec   160 KBytes  64.2 Kbits/sec  20
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
[  4]  20.42-20.42  sec  0.00 Bytes  0.00 bits/sec  0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval          Transfer    Bandwidth      Jitter   Lost/Total Datagrams
[  4]   0.00-20.42  sec  160 KBytes 64.2 Kbits/sec 0.000 ms 0/20 (0%)
[  4] Sent 20 datagrams
iperf3: error - the server has terminated

The network topology is FTGMAC connects directly to a PC.
UDP does not need to wait for ACK, unlike TCP.
Therefore, FTGMAC needs to enable TX interrupt to release TX resources instead
of waiting for the RX interrupt.

Fixes: 10cbd64076 ("ftgmac100: Rework NAPI & interrupts handling")
Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Link: https://patch.msgid.link/20240906062831.2243399-1-jacky_chou@aspeedtech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7f84d4613b9fdf9e14bbab867e879a0df782a163)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:02:25 +05:30
Eran Ben Elisha
20f7077f47 net/mlx5: Update the list of the PCI supported devices
[ Upstream commit 85327a9c415057259b337805d356705d0d0f4200 ]

Add the upcoming ConnectX-6 Dx.

In addition, add "ConnectX Family mlx5Gen Virtual Function" device ID.
Every new HCA VF will be identified with this device ID. Different VF
models will be distinguished by their revision id.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a689f610abc8d4c8dfd775e09fd306f19cfe6509)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:02:22 +05:30
Quentin Schulz
40af4cb67b arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma
commit 741f5ba7ccba5d7ae796dd11c320e28045524771 upstream.

The Qseven BIOS_DISABLE signal on the RK3399-Q7 keeps the on-module eMMC
and SPI flash powered-down initially (in fact it keeps the reset signal
asserted). BIOS_DISABLE_OVERRIDE pin allows to override that signal so
that eMMC and SPI can be used regardless of the state of the signal.

Let's make this GPIO a hog so that it's reserved and locked in the
proper state.

At the same time, make sure the pin is reserved for the hog and cannot
be requested by another node.

Cc: stable@vger.kernel.org
Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
Link: https://lore.kernel.org/r/20240731-puma-emmc-6-v1-2-4e28eadf32d0@cherry.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4a0400793ac3961a07fcd472f7eb789d12d0db6a)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:02:14 +05:30
Anders Roxell
55e3d30b0b scripts: kconfig: merge_config: config files: add a trailing newline
[ Upstream commit 33330bcf031818e60a816db0cfd3add9eecc3b28 ]

When merging files without trailing newlines at the end of the file, two
config fragments end up at the same row if file1.config doens't have a
trailing newline at the end of the file.

file1.config "CONFIG_1=y"
file2.config "CONFIG_2=y"
./scripts/kconfig/merge_config.sh -m .config file1.config file2.config

This will generate a .config looking like this.
cat .config
...
CONFIG_1=yCONFIG_2=y"

Making sure so we add a newline at the end of every config file that is
passed into the script.

Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6a130ec2f0646a8544308b6cf983269d5a2a7fa0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:36 +05:30
Moon Yeounsu
8deed10d06 net: ethernet: use ip_hdrlen() instead of bit shift
[ Upstream commit 9a039eeb71a42c8b13408a1976e300f3898e1be0 ]

`ip_hdr(skb)->ihl << 2` is the same as `ip_hdrlen(skb)`
Therefore, we should use a well-defined function not a bit shift
to find the header length.

It also compresses two lines to a single line.

Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com>
Reviewed-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a81761c1ba59444fc3f644e7d8713ac35e7911c4)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:35 +05:30
Foster Snowhill
e5ce6f167b usbnet: ipheth: fix carrier detection in modes 1 and 4
[ Upstream commit 67927a1b255d883881be9467508e0af9a5e0be9d ]

Apart from the standard "configurations", "interfaces" and "alternate
interface settings" in USB, iOS devices also have a notion of
"modes". In different modes, the device exposes a different set of
available configurations.

Depending on the iOS version, and depending on the current mode, the
length and contents of the carrier state control message differs:

* 1 byte (seen on iOS 4.2.1, 8.4):
    * 03: carrier off (mode 0)
    * 04: carrier on (mode 0)
* 3 bytes (seen on iOS 10.3.4, 15.7.6):
    * 03 03 03: carrier off (mode 0)
    * 04 04 03: carrier on (mode 0)
* 4 bytes (seen on iOS 16.5, 17.6):
    * 03 03 03 00: carrier off (mode 0)
    * 04 03 03 00: carrier off (mode 1)
    * 06 03 03 00: carrier off (mode 4)
    * 04 04 03 04: carrier on (mode 0 and 1)
    * 06 04 03 04: carrier on (mode 4)

Before this change, the driver always used the first byte of the
response to determine carrier state.

From this larger sample, the first byte seems to indicate the number of
available USB configurations in the current mode (with the exception of
the default mode 0), and in some cases (namely mode 1 and 4) does not
correlate with the carrier state.

Previous logic erroneously counted `04 03 03 00` as "carrier on" and
`06 04 03 04` as "carrier off" on iOS versions that support mode 1 and
mode 4 respectively.

Only modes 0, 1 and 4 expose the USB Ethernet interfaces necessary for
the ipheth driver.

Check the second byte of the control message where possible, and fall
back to checking the first byte on older iOS versions.

Signed-off-by: Foster Snowhill <forst@pen.gy>
Tested-by: Georgi Valkov <gvalkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 32dafeb84c84a2d420de27e5e30e4ea6339e4d07)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:35 +05:30
Aleksandr Mishin
e4170e9c81 staging: iio: frequency: ad9834: Validate frequency parameter value
[ Upstream commit b48aa991758999d4e8f9296c5bbe388f293ef465 ]

In ad9834_write_frequency() clk_get_rate() can return 0. In such case
ad9834_calc_freqreg() call will lead to division by zero. Checking
'if (fout > (clk_freq / 2))' doesn't protect in case of 'fout' is 0.
ad9834_write_frequency() is called from ad9834_write(), where fout is
taken from text buffer, which can contain any value.

Modify parameters checking.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 12b9d5bf76 ("Staging: IIO: DDS: AD9833 / AD9834 driver")
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20240703154506.25584-1-amishin@t-argos.ru
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5edc3a45ef428501000a7b23d0e1777a548907f6)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:34 +05:30
Beniamin Bia
bacee9027d staging: iio: frequency: ad9833: Load clock using clock framework
[ Upstream commit 8e8040c52e63546d1171c188a24aacf145a9a7e0 ]

The clock frequency is loaded from device-tree using clock framework
instead of statically value. The change allow configuration of
the device via device-trees and better initialization sequence.
This is part of broader effort to add device-tree support to this driver
and take it out from staging.

Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: b48aa9917589 ("staging: iio: frequency: ad9834: Validate frequency parameter value")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a6316b6f127a877285c83d2ed45b20e6712e6d1b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:33 +05:30
Beniamin Bia
89861c905b staging: iio: frequency: ad9833: Get frequency value statically
[ Upstream commit 80109c32348d7b2e85def9efc3f9524fb166569d ]

The values from platform data were replaced by statically values.
This was just a intermediate step of taking this driver out of staging and
load data from device tree.

Signed-off-by: Beniamin Bia <beniamin.bia@analog.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Stable-dep-of: b48aa9917589 ("staging: iio: frequency: ad9834: Validate frequency parameter value")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a3138f0925714ea47f817257447fa0b87c8bcf28)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
2025-09-05 10:01:29 +05:30
theshaenix
5cf4e6e388 configs: Disable LOCALVERSION_AUTO to remove git hash from kernel version
This disables CONFIG_LOCALVERSION_AUTO to prevent the automatic
appending of the Git commit hash (e.g., -gxxxxxxxx) to the kernel
version string shown in 'About phone' and via 5.15.167.4-microsoft-standard-WSL2.
2025-09-05 09:46:22 +05:30
theshaenix
5ba802bd6b configs: enable Dynamic FSYNC 2025-09-05 09:43:22 +05:30
psndna88
957abd442a fs: dynamic fsync - drop reboot/panic handling when REBOOT_AUTO_FSYNC in use 2025-09-05 09:42:46 +05:30
Erik Müller
963b94d15a fs: Introduce dynamic fsync 2.3
Signed-off-by: Oktapra Amtono <oktapra.amtono@gmail.com>
2025-09-05 09:42:32 +05:30
theshaenix
efcd560ed2 configs: enable WPA3 Personal 2025-09-05 09:41:13 +05:30
aminfauzi
6bd2b81a9c [SQUASH] drivers: Add KernelSU-Next V1.0.9 and SUSFS V1.5.9
Signed-off-by: aminfauzi <aremean0107@gmail.com>
2025-09-05 09:31:24 +05:30
Chung-Hsien Hsu
b1d566f2c8 nl80211: add WPA3 definition for SAE authentication
Add definition of WPA version 3 for SAE authentication.

Change-Id: I19ca34b8965168f011cc1352eba420f2d54b0258
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-09-05 09:29:28 +05:30
Kamal Agrawal
acdcff3c74 msm: kgsl: Remove sysfs entries after releasing memory
Consider a scenario when device is in OOM situation. Assume a process is
doing sysfs operation, grabs kernfs_mutex and wants to allocate memory.
LMKD tries to kill processes but kgsl processes will be blocked waiting
for kernfs_mutex in sysfs_remove_file.

kgsl_process_private_close > kgsl_process_uninit_sysfs > sysfs_remove_file

KGSL won't free up memory as it is done after sysfs removal leading to a
livelock. Fix it by releasing memory before removing sysfs entries.

Change-Id: I99640d7a653faffa671d5b035abb78e9473da12e
Signed-off-by: Kamal Agrawal <kamaagra@codeaurora.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-05 09:29:17 +05:30
Kamal Agrawal
e09c3e2048 msm: kgsl: Remove debugfs directory inside lock
In kgsl_process_private_close, debugfs directory is removed after mutex
is unlocked. Considering this, a race can be created between
kgsl_process_private_close and kgsl_process_init_debugfs. Fix it by
moving debugfs directory removal inside lock.

Change-Id: Ida65ab8a3825d8c695c56556860495cce853117c
Signed-off-by: Kamal Agrawal <kamaagra@codeaurora.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-05 09:29:05 +05:30
Zhongqiu Han
4dcffe4db3 sched: idle: Optimize the generic idle loop by removing needless memory barrier
The memory barrier rmb() in generic idle loop do_idle() function is not
needed, it doesn't order any load instruction, just remove it as needless
rmb() can cause performance impact.

The rmb() was introduced by the tglx/history.git commit f2f1b44c75c4
("[PATCH] Remove RCU abuse in cpu_idle()") to order the loads between
cpu_idle_map and pm_idle. It pairs with wmb() in function cpu_idle_wait().

And then with the removal of cpu_idle_state in function cpu_idle() and
wmb() in function cpu_idle_wait() in commit 783e391b7b ("x86: Simplify
cpu_idle_wait"), rmb() no longer has a reason to exist.

After that, commit d166991234 ("idle: Implement generic idle function")
implemented a generic idle function cpu_idle_loop() which resembles the
functionality found in arch/. And it retained the rmb() in generic idle
loop in file kernel/cpu/idle.c.

And at last, commit cf37b6b484 ("sched/idle: Move cpu/idle.c to
sched/idle.c") moved cpu/idle.c to sched/idle.c. And commit c1de45ca83
("sched/idle: Add support for tasks that inject idle") renamed function
cpu_idle_loop() to do_idle().

History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241009093745.9504-1-quic_zhonhan@quicinc.com
Change-Id: I7d04d05f25b66ab266b66424dfddd58857e5242b
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-09-05 09:28:51 +05:30
Kir Kolyshkin
3e0ea3a661 sched/headers: Move 'struct sched_param' out of uapi, to work around glibc/musl breakage 2025-09-05 09:28:43 +05:30
Samuel Pascua
46d2437d49 bpf: verifier_log: fix null pointer dereference
`log->kbuf` can be null resulting in null pointer derefernce
in `vscnprintf()`

Signed-off-by: Samuel Pascua <pascua.samuel.14@gmail.com>
2025-09-05 09:28:32 +05:30
Jonathan Lemon
d25ed9674d bpf: lpm_trie: check left child of last leftmost node for NULL
If the leftmost parent node of the tree has does not have a child
on the left side, then trie_get_next_key (and bpftool map dump) will
not look at the child on the right.  This leads to the traversal
missing elements.

Lookup is not affected.

Update selftest to handle this case.

Reproducer:

 bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \
     value 1 entries 256 name test_lpm flags 1
 bpftool map update pinned /sys/fs/bpf/lpm key  8 0 0 0  0   0 value 1
 bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0  0 128 value 2
 bpftool map dump   pinned /sys/fs/bpf/lpm

Returns only 1 element. (2 expected)

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE")
Change-Id: I942431b7feaa82aab38d4c37b3b5920ae70d8e24
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-09-05 09:28:23 +05:30
Byeonguk Jeong
df0a47c3dc BACKPORT: bpf: Fix out-of-bounds write in trie_get_next_key()
trie_get_next_key() allocates a node stack with size trie->max_prefixlen,
while it writes (trie->max_prefixlen + 1) nodes to the stack when it has
full paths from the root to leaves. For example, consider a trie with
max_prefixlen is 8, and the nodes with key 0x00/0, 0x00/1, 0x00/2, ...
0x00/8 inserted. Subsequent calls to trie_get_next_key with _key with
.prefixlen = 8 make 9 nodes be written on the node stack with size 8.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Change-Id: I0626bd93acddf978dc56f8b1ee13305c50c90210
Signed-off-by: Byeonguk Jeong <jungbu2855@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org>
Tested-by: Hou Tao <houtao1@huawei.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/Zxx384ZfdlFYnz6J@localhost.localdomain
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-05 09:28:14 +05:30
Hou Tao
3267b1ca88 bpf: Fix exact match conditions in trie_get_next_key()
trie_get_next_key() uses node->prefixlen == key->prefixlen to identify
an exact match, However, it is incorrect because when the target key
doesn't fully match the found node (e.g., node->prefixlen != matchlen),
these two nodes may also have the same prefixlen. It will return
expected result when the passed key exist in the trie. However when a
recently-deleted key or nonexistent key is passed to
trie_get_next_key(), it may skip keys and return incorrect result.

Fix it by using node->prefixlen == matchlen to identify exact matches.
When the condition is true after the search, it also implies
node->prefixlen equals key->prefixlen, otherwise, the search would
return NULL instead.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-05 09:28:05 +05:30
Yonghong Song
01e160688a bpf: fix kernel page fault in lpm map trie_get_next_key
Commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command
for LPM_TRIE map") introduces a bug likes below:

    if (!rcu_dereference(trie->root))
        return -ENOENT;
    if (!key || key->prefixlen > trie->max_prefixlen) {
        root = &trie->root;
        goto find_leftmost;
    }
    ......
  find_leftmost:
    for (node = rcu_dereference(*root); node;) {

In the code after label find_leftmost, it is assumed
that *root should not be NULL, but it is not true as
it is possbile trie->root is changed to NULL by an
asynchronous delete operation.

The issue is reported by syzbot and Eric Dumazet with the
below error log:
  ......
  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault: 0000 [#1] SMP KASAN
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 1 PID: 8033 Comm: syz-executor3 Not tainted 4.15.0-rc8+ #4
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:trie_get_next_key+0x3c2/0xf10 kernel/bpf/lpm_trie.c:682
  ......

This patch fixed the issue by use local rcu_dereferenced
pointer instead of *(&trie->root) later on.

Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command or LPM_TRIE map")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-05 09:27:54 +05:30
Yonghong Song
9dbe0fc3fe bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map
Current LPM_TRIE map type does not implement MAP_GET_NEXT_KEY
command. This command is handy when users want to enumerate
keys. Otherwise, a different map which supports key
enumeration may be required to store the keys. If the
map data is sparse and all map data are to be deleted without
closing file descriptor, using MAP_GET_NEXT_KEY to find
all keys is much faster than enumerating all key space.

This patch implements MAP_GET_NEXT_KEY command for LPM_TRIE map.
If user provided key pointer is NULL or the key does not have
an exact match in the trie, the first key will be returned.
Otherwise, the next key will be returned.

In this implemenation, key enumeration follows a postorder
traversal of internal trie. More specific keys
will be returned first than less specific ones, given
a sequence of MAP_GET_NEXT_KEY syscalls.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-09-05 09:27:44 +05:30
theshaenix
7f9689211d configs: Enable FastRPC and Power Management support for ADSP
Enabled necessary kernel config options to support Qualcomm FastRPC and
domain-based SELinux enforcement. These are required for proper ADSP
and subsystem communication via fastrpc, as well as for PM and wakelock
handling:

- CONFIG_QCOM_FASTRPC
- CONFIG_PM_RUNTIME
- CONFIG_WAKELOCK
- CONFIG_QCOM_ADSP_PIL
- CONFIG_QCOM_Q6V5_ADSP
- CONFIG_QCOM_Q6V5_MSS
- CONFIG_QCOM_MSM_SMD

This resolves boot-time warnings like:
  wakelock_control_kernel_pm: kernel does not support PM management (Invalid request code)
2025-09-05 09:27:21 +05:30
theshaenix
785d2c9712 fs: Add exfat filesystem support
Add exfat filesystem support in fs module.
2025-09-05 09:24:45 +05:30
theshaenix
c619cd9dc0 Revert "BACKPORT: ANDROID: userfaultfd: add MMAP_TRYLOCK mode for COPY/ZEROPAGE"
This reverts commit fa49c6bf49.
2025-09-05 09:08:20 +05:30
Lokesh Gidra
b90d43dd2b ANDROID: Fix compilation error with huge_pmd_share()
There was an asterisk missing for one of the function parameters in the
upstreamed patch.

Fixes: e8ba376301a36 ("BACKPORT: FROMGIT: hugetlb: pass vma into
huge_pte_alloc() and huge_pmd_share()")

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I110563bc38e60a829fe7808f69dc0aa0f203a50e
2025-09-05 07:15:39 +05:30
Peter Xu
48897ca745 UPSTREAM: mm/userfaultfd: selftests: fix memory corruption with thp enabled
In RHEL's gating selftests we've encountered memory corruption in the
uffd event test even with upstream kernel:

        # ./userfaultfd anon 128 4
        nr_pages: 32768, nr_pages_per_cpu: 32768
        bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729)
        bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877)
        bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699)
        bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196)
        testing uffd-wp with pagemap (pgsize=4096): done
        testing uffd-wp with pagemap (pgsize=2097152): done
        testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963)
        ERROR: faulting process failed (errno=0, line=1117)

It can be easily reproduced when global thp enabled, which is the
default for RHEL.

It's also known as a side effect of commit 0db282ba2c12 ("selftest: use
mmap instead of posix_memalign to allocate memory", 2021-07-23), which
is imho right itself on using mmap() to make sure the addresses will be
untagged even on arm.

The problem is, for each test we allocate buffers using two
allocate_area() calls.  We assumed these two buffers won't affect each
other, however they could, because mmap() could have found that the two
buffers are near each other and having the same VMA flags, so they got
merged into one VMA.

It won't be a big problem if thp is not enabled, but when thp is
agressively enabled it means when initializing the src buffer it could
accidentally setup part of the dest buffer too when there's a shared THP
that overlaps the two regions.  Then some of the dest buffer won't be
able to be trapped by userfaultfd missing mode, then it'll cause memory
corruption as described.

To fix it, do release_pages() after initializing the src buffer.

Since the previous two release_pages() calls are after
uffd_test_ctx_clear() which will unmap all the buffers anyway (which is
stronger than release pages; as unmap() also tear town pgtables), drop
them as they shouldn't really be anything useful.

We can mark the Fixes tag upon 0db282ba2c12 as it's reported to only
happen there, however the real "Fixes" IMHO should be 8ba6e8640844, as
before that commit we'll always do explicit release_pages() before
registration of uffd, and 8ba6e8640844 changed that logic by adding
extra unmap/map and we didn't release the pages at the right place.
Meanwhile I don't have a solid glue anyway on whether posix_memalign()
could always avoid triggering this bug, hence it's safer to attach this
fix to commit 8ba6e8640844.

Link: https://lkml.kernel.org/r/20210923232512.210092-1-peterx@redhat.com
Fixes: 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Li Wang <liwan@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8913970c19915bbe773d97d42989cd85b7fdc098)
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I1378a52661d17c5cf6e8a0e84e3216556160c1b8
2025-09-05 07:15:29 +05:30
Peter Xu
34d295872d UPSTREAM: mm/shmem: use page_mapping() to detect page cache for uffd continue
mfill_atomic_install_pte() checks page->mapping to detect whether one page
is used in the page cache.  However as pointed out by Matthew, the page
can logically be a tail page rather than always the head in the case of
uffd minor mode with UFFDIO_CONTINUE.  It means we could wrongly install
one pte with shmem thp tail page assuming it's an anonymous page.

It's not that clear even for anonymous page, since normally anonymous
pages also have page->mapping being setup with the anon vma.  It's safe
here only because the only such caller to mfill_atomic_install_pte() is
always passing in a newly allocated page (mcopy_atomic_pte()), whose
page->mapping is not yet setup.  However that's not extremely obvious
either.

For either of above, use page_mapping() instead.

Bug: 254441685
Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n
Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Matthew Wilcox <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 93b0d9178743a68723babe8448981f658aebc58e)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I03246130310cc7f3486843ed945ef92cab966cdc
2025-09-05 07:15:21 +05:30
Nadav Amit
8056206f5c UPSTREAM: mm/userfaultfd: fix memory corruption due to writeprotect
Userfaultfd self-test fails occasionally, indicating a memory corruption.

Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy().  Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.

To make matters worse, memory-unprotection using userfaultfd also poses a
problem.  Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set.  Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.

The scenario that happens in selftests/vm/userfaultfd is as follows:

cpu0				cpu1			cpu2
----				----			----
							[ Writable PTE
							  cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()

change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
				[ page-fault ]
				...
				wp_page_copy()
				 cow_user_page()
				  [ copy page ]
							[ write to old
							  page ]
				...
				 set_pte_at_notify()

A similar scenario can happen:

cpu0		cpu1		cpu2		cpu3
----		----		----		----
						[ Writable PTE
				  		  cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
		userfaultfd_writeprotect()
		[ write-unprotect ]
		[ deferred TLB flush]
				[ page-fault ]
				wp_page_copy()
				 cow_user_page()
				 [ copy page ]
				 ...		[ write to page ]
				set_pte_at_notify()

This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit").  Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.

To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.

Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.

Bug: 254441685
Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit@vmware.com>
Suggested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 6ce64428d62026a10cb5d80138ff2f90cc21d367)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ie35334aa739acfade88b74c9e5dde5c8a387925d
2025-09-05 07:15:11 +05:30
Baolin Wang
351e86206b UPSTREAM: mm: hugetlb: add missing cache flushing in hugetlb_unshare_all_pmds()
Missed calling flush_cache_range() before removing the sharing PMD
entrires, otherwise data consistence issue may be occurred on some
architectures whose caches are strict and require a virtual>physical
translation to exist for a virtual address.  Thus add it.

Now no architectures enabling PMD sharing will be affected, since they do
not have a VIVT cache.  That means this issue can not be happened in
practice so far.

Bug: 254441685
Link: https://lkml.kernel.org/r/47441086affcabb6ecbe403173e9283b0d904b38.1650956489.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/419b0e777c9e6d1454dcd906e0f5b752a736d335.1650781755.git.baolin.wang@linux.alibaba.com
Fixes: 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 9c8bbfaca1bce84664403fd7dddbef6b3ff0a05a)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Ib7d886d4f8bc18087771b9999bb7d9941a879581
2025-09-05 07:15:03 +05:30
Michal Hocko
acb01962e2 BACKPORT: mm, mempolicy: fix up gup usage in lookup_node
ba841078cd05 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
has added a special casing for 0 return value because that was a possible
gup return value when interrupted by fatal signal.  This has been fixed by
ae46d2aa6a7f ("mm/gup: Let __get_user_pages_locked() return -EINTR for
fatal signal") in the mean time so ba841078cd05 can be reverted.

This patch however doesn't go all the way to revert it because the check
for 0 is wrong and confusing here.  Firstly it is inherently unsafe to
access the page when get_user_pages_locked returns 0 (aka no page
returned).

Fortunatelly this will not happen because get_user_pages_locked will not
return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
the case here.  Document this potential error code in gup code while we
are at it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>
Link: http://lkml.kernel.org/r/20200421071026.18394-1-mhocko@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 2d3a36a47964371101d9a71691c18d59ee611e87)

[Kalesh Singh: Resolve conflict in mm/gup.c]
Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Idac45e534bba1524a696993447220d12332ced05
2025-09-05 07:14:55 +05:30
Peter Xu
08f04a4b12 UPSTREAM: mm/mempolicy: Allow lookup_node() to handle fatal signal
lookup_node() uses gup to pin the page and get node information.  It
checks against ret>=0 assuming the page will be filled in.  However it's
also possible that gup will return zero, for example, when the thread is
quickly killed with a fatal signal.  Teach lookup_node() to gracefully
return an error -EFAULT if it happens.

Meanwhile, initialize "page" to NULL to avoid potential risk of
exploiting the pointer.

Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+693dc11fcb53120b5559@syzkaller.appspotmail.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ba841078cd0557b43b59c63f5c048b12168f0db2)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I1cfd121c3b603db000a0bebe252c9dec6377f0b0
2025-09-05 07:14:45 +05:30
Hillf Danton
554cb42b06 UPSTREAM: mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal
__get_user_pages_locked() will return 0 instead of -EINTR after commit
4426e945df588 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which
added extra code to allow gup detect fatal signal faster.

Restore the original -EINTR behavior.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: syzbot+3be1a33f04dc782e9fd5@syzkaller.appspotmail.com
Signed-off-by: Hillf Danton <hdanton@sina.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ae46d2aa6a7fbe8ca0946f24b061b6ccdc6c3f25)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ic812596b077e7b5fbfde90f2253241afa1fa42cf
2025-09-05 07:14:36 +05:30
Peter Xu
388988653e UPSTREAM: mm/gup: fix fixup_user_fault() on multiple retries
This part was overlooked when reworking the gup code on multiple
retries.

When we get the 2nd+ retry, we'll be with TRIED flag set.  Current code
will bail out on the 2nd retry because the !TRIED check will fail so the
retry logic will be skipped.  What's worse is that, it will also return
zero which errornously hints the caller that the page is faulted in
while it's not.

The !TRIED flag check seems to not be needed even before the mutliple
retries change because if we get a VM_FAULT_RETRY, it must be the 1st
retry, and we should not have TRIED set for that.

Fix it by removing the !TRIED check, at the meantime check against fatal
signals properly before the page fault so we can still properly respond
to the user killing the process during retries.

Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Link: http://lkml.kernel.org/r/20200502003523.8204-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 475f4dfc021c5fde69f3b7d3287bde0a50477b05)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I51dbc07fc845b346691f9bcf4846a1d9355cbe71
2025-09-05 07:12:30 +05:30
Peter Xu
7a28b40fd5 UPSTREAM: mm/gup: Mark lock taken only after a successful retake
It's definitely incorrect to mark the lock as taken even if
down_read_killable() failed.

This wass overlooked when we switched from down_read() to
down_read_killable() because down_read() won't fail while
down_read_killable() could.

Fixes: 71335f37c5e8 ("mm/gup: allow to react to fatal signals")
Reported-by: syzbot+a8c70b7f3579fc0587dc@syzkaller.appspotmail.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c7b6a566b98524baea6a244186e665d22b633545)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Id0e1fe147e87ff6bba02d5e590f6346089c5e650
2025-09-05 07:12:08 +05:30
Peter Xu
f91043481b UPSTREAM: mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path
Userfaultfd fault path was by default killable even if the caller does not
have FAULT_FLAG_KILLABLE.  That makes sense before in that when with gup
we don't have FAULT_FLAG_KILLABLE properly set before.  Now after previous
patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.

Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
right now, this patch should have no functional change.  It also cleaned
the code a little bit by introducing some helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3e69ad081c18d138fc7fd0f1ceef3b055ab10549)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I514975b9934e3ba2860661b0d463fd46410b7a90
2025-09-05 07:11:51 +05:30
Peter Xu
4d3b795c11 UPSTREAM: mm/gup: allow to react to fatal signals
The existing gup code does not react to the fatal signals in many code
paths.  For example, in one retry path of gup we're still using
down_read() rather than down_read_killable().  Also, when doing page
faults we don't pass in FAULT_FLAG_KILLABLE as well, which means that
within the faulting process we'll wait in non-killable way as well.  These
were spotted by Linus during the code review of some other patches.

Let's allow the gup code to react to fatal signals to improve the
responsiveness of threads when during gup and being killed.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160256.9887-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 71335f37c5e8ec9225285206f7f875057b9737ad)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Iafde40bcfc0bed8311407a9b74c701e3d1b57062
2025-09-05 07:11:45 +05:30
Kirill Tkhai
99c2564d6d locking/rwsem: Add down_read_killable()
Similar to down_read() and down_write_killable(),
add killable version of down_read(), based on
__down_read_killable() function, added in previous
patches.

Change-Id: I1437294240803082fdb24bdfd3231c8f09d3ff11
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670119884.23930.2585570605960763239.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-09-05 07:11:37 +05:30
Peter Xu
fda00f65d9 UPSTREAM: mm/gup: allow VM_FAULT_RETRY for multiple times
This is the gup counterpart of the change that allows the VM_FAULT_RETRY
to happen for more than once.  One thing to mention is that we must check
the fatal signal here before retry because the GUP can be interrupted by
that, otherwise we can loop forever.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220195357.16371-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4426e945df588f2878affddf88a51259200f7e29)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If774cfcc1ff5710bfc0ee48321a2d1b8fe5e97d5
2025-09-05 07:11:30 +05:30
Peter Xu
d22caad8ea BACKPORT: UPSTREAM: mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4064b982706375025628094e51d11cf1a958a5d3)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If0378d8ccbfc54a574b91103a6dc76e446f5f12e
2025-09-05 07:11:23 +05:30
Peter Xu
ba4c4d9579 UPSTREAM: mm: introduce FAULT_FLAG_INTERRUPTIBLE
handle_userfaultfd() is currently the only one place in the kernel page
fault procedures that can respond to non-fatal userspace signals.  It was
trying to detect such an allowance by checking against USER & KILLABLE
flags, which was "un-official".

In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show
that the fault handler allows the fault procedure to respond even to
non-fatal signals.  Meanwhile, add this new flag to the default fault
flags so that all the page fault handlers can benefit from the new flag.
With that, replacing the userfault check to this one.

Since the line is getting even longer, clean up the fault flags a bit too
to ease TTY users.

Although we've got a new flag and applied it, we shouldn't have any
functional change with this patch so far.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c270a7eedcf278304e05ebd2c96807487c97db61)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I4abde9ce54aae56b0791884adf45210973f8a210
2025-09-05 07:11:15 +05:30
Peter Xu
68c6ec191c UPSTREAM: mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags.  Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over.  With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit dde1607248328cdb7570e3a252e8fb76b3411d66)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I53fb69d7974571ed827cc7cc2a54af555c80643f
2025-09-05 07:11:09 +05:30
Peter Xu
b27bf48d31 UPSTREAM: userfaultfd: don't retake mmap_sem to emulate NOPAGE
This patch removes the risk path in handle_userfault() then we will be
sure that the callers of handle_mm_fault() will know that the VMAs might
have changed.  Meanwhile with previous patch we don't lose responsiveness
as well since the core mm code now can handle the nonfatal userspace
signals even if we return VM_FAULT_RETRY.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ef429ee7409aa7cbe4c3c9e2df5dc6abedfab493)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I8173824922cd6310341a367e622652a3d87437f1
2025-09-05 07:11:00 +05:30
Peter Xu
370207257a UPSTREAM: mm: return faster for non-fatal signals in user mode faults
The idea comes from the upstream discussion between Linus and Andrea:

  https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/

A summary to the issue: there was a special path in handle_userfault() in
the past that we'll return a VM_FAULT_NOPAGE when we detected non-fatal
signals when waiting for userfault handling.  We did that by reacquiring
the mmap_sem before returning.  However that brings a risk in that the
vmas might have changed when we retake the mmap_sem and even we could be
holding an invalid vma structure.

This patch is a preparation of removing that special path by allowing the
page fault to return even faster if we were interrupted by a non-fatal
signal during a user-mode page fault handling routine.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160230.9598-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8b9a65fd282c1d2e5b8ba8d8afaf652cde27b5e7)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5f4fea4454b0c3c0e27c09e64d825fe94ce8a775
2025-09-05 07:10:39 +05:30
Peter Xu
c5c63f148b UPSTREAM: sh/mm: use helper fault_signal_pending()
Let SH to use the new fault_signal_pending() helper.  Here we'll need to
move the up_read() out because that's actually needed as long as !RETRY
cases.  At the meantime we can drop all the rest of up_read()s now (which
seems to be cleaner).

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160226.9550-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb027ada051a9e2d70a069b2aa62fb6f52100bbf)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Icd6511efc716bbc9cb78c6ff9c99e9dbe19d72d5
2025-09-05 07:10:24 +05:30
Peter Xu
c126d4cef9 UPSTREAM: powerpc/mm: use helper fault_signal_pending()
Let powerpc code to use the new helper, by moving the signal handling
earlier before the retry logic.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160222.9422-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c9a0dad162014182867f81b28bb7a4b691d65595)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ia4818fd5c35e85993ff87b1326a133334380868b
2025-09-05 07:10:16 +05:30
Peter Xu
19a0ba6351 UPSTREAM: arm64/mm: use helper fault_signal_pending()
Let the arm64 fault handling to use the new fault_signal_pending() helper,
by moving the signal handling out of the retry logic.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155927.9264-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit b502f038f2ffc97a60fefcc120a868aa46009060)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I0168a43369eed5d3880a45d6784bae86ed5213be
2025-09-05 07:10:09 +05:30
Peter Xu
72e61c21f7 UPSTREAM: arc/mm: use helper fault_signal_pending()
Let ARC to use the new helper fault_signal_pending() by moving the signal
check out of the retry logic as standalone.  This should also helps to
simplify the code a bit.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155843.9172-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 24a62cf41f670fcba90dfba4db2a59a22cc830d5)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I8a014591633edc99715f73d6fba908c5d9ceeef3
2025-09-05 07:10:01 +05:30
Peter Xu
5f15fdf766 UPSTREAM: x86/mm: use helper fault_signal_pending()
Let's move the fatal signal check even earlier so that we can directly use
the new fault_signal_pending() in x86 mm code.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-5-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 39678191cd8988c811813baf4c97b43bf46094e4)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Icaf666ec5f9f83043396ae0c5b6a079e9c2349bc
2025-09-05 07:09:50 +05:30
Peter Xu
f1e277b794 BACKPORT: mm: introduce fault_signal_pending()
For most architectures, we've got a quick path to detect fatal signal
after a handle_mm_fault().  Introduce a helper for that quick path.

It cleans the current codes a bit so we don't need to duplicate the same
check across archs.  More importantly, this will be an unified place that
we handle the signal immediately right after an interrupted page fault, so
it'll be much easier for us if we want to change the behavior of handling
signals later on for all the archs.

Note that currently only part of the archs are using this new helper,
because some archs have their own way to handle signals.  In the follow up
patches, we'll try to apply this helper to all the rest of archs.

Another note is that the "regs" parameter in the new helper is not used
yet.  It'll be used very soon.  Now we kept it in this patch only to avoid
touching all the archs again in the follow up patches.

[peterx@redhat.com: fix sparse warnings]
  Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4ef873226ceb9c7bf11a922caddc5698a24bcfaf)

[Kalesh Singh: Resolve #include conflict in include/linux/sched/signal.h]
Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ie9080d75c2cc22b1fd4b50755e9e61ad0ab677c2
2025-09-05 07:09:41 +05:30
Vineet Gupta
df09a2dd32 ARC: mm: do_page_fault refactor #8: release mmap_sem sooner
In case of successful page fault handling, this patch releases mmap_sem
before updating the perf stat event for major/minor faults. So even
though the contention reduction is NOT super high, it is still an
improvement.

There's an additional code size improvement as we only have 2 up_read()
calls now.

Note to myself:
--------------

1. Given the way it is done, we are forced to move @bad_area label earlier
   causing the various "goto bad_area" cases to hit perf stat code.

 - PERF_COUNT_SW_PAGE_FAULTS is NOW updated for access errors which is what
   arm/arm64 seem to be doing as well (with slightly different code)
 - PERF_COUNT_SW_PAGE_FAULTS_{MAJ,MIN} must NOT be updated for the
   error case which is guarded by now setting @fault initial value
   to VM_FAULT_ERROR which serves both cases when handle_mm_fault()
   returns error or is not called at all.

2. arm/arm64 use two homebrew fault flags VM_FAULT_BAD{MAP,MAPACCESS}
   which I was inclined to add too but seems not needed for ARC

 - given that we have everything is 1 function we can still use goto
 - we setup si_code at the right place (arm* do that in the end)
 - we init fault already to error value which guards entry into perf
   stats event update

Cc: Peter Zijlstra <peterz@infradead.org>
Change-Id: I35f2d8c2b5f16fdf14213c1167cec61e21938134
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:09:33 +05:30
Vineet Gupta
76d13cda39 ARC: mm: do_page_fault refactor #7: fold the various error handling
- single up_read() call vs. 4
 - so much easier on eyes

Technically it seems like @bad_area label moved up, but even in old
regime, it was a special case of delivering SIGSEGV unconditionally
which we now do as well, although with checks.

Also note that @fault needs to be initialized since we can land in
@bad_area (which reads it) without setting it up with return value of
handle_mm_fault() - failing to do so did bite us although as a side
effect of different patch: see [1]

[1]: http://lists.infradead.org/pipermail/linux-snps-arc/2019-May/005803.html

Change-Id: I0964d9868efc0add93878ee57a54631604fcd42c
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:09:26 +05:30
Vineet Gupta
5d19d9033e ARC: mm: do_page_fault refactor #6: error handlers to use same pattern
- up_read
 - if !user_mode
 - whatever error handling

Change-Id: I7056cdc08fe6e8106e4e5dfee9fdc98d6fa10b81
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:09:17 +05:30
Vineet Gupta
edf571ee09 ARC: mm: do_page_fault refactor #5: scoot no_context to end
This is different than the rest of signal handling stuff

No functional change

Change-Id: I86a0d68e763ca8d1131021201a21a07e5d47a6bb
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:09:10 +05:30
Vineet Gupta
7fe9cc04d9 ARC: mm: do_page_fault refactor #4: consolidate retry related logic
stats update code can now elide "retry" check and additional level of
indentation since all retry handling is done ahead of it already

Change-Id: If1816cf5b4a522774d67aabcd02188e1c98f0601
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:09:02 +05:30
Vineet Gupta
181b8be8cd ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
The coding pattern to NOT intialize variables at declaration time but
rather near code which makes us eof them makes it much easier to grok
the overall logic, specially when the init is not simply 0 or 1

Change-Id: I6d2d00d0ae258bcdee115b9a6bd7dec22e551a00
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:08:55 +05:30
Vineet Gupta
612610a1c4 ARC: mm: do_page_fault refactor #2: remove short lived variable
Compiler will do this anyways, still..

No functional change.

Change-Id: Ie48e7058203b83b18fda78962bf0afdffb9fffbf
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:08:46 +05:30
Vineet Gupta
81876efac0 ARC: mm: do_page_fault refactor #1: remove label @good_area
Invert the condition for stack expansion.
No functional change

Change-Id: Ia502955f3d0a680dde1105559f38abd0d2d0db24
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:08:37 +05:30
Eugeniy Paltsev
fda594b575 ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
As of today if userspace process tries to access a kernel virtual addres
(0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
exists, that process hangs instead of being killed with SIGSEGV

Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
in kernel mode.

And given this, we can also simplify the code a bit. Now a vmalloc fault
implies kernel mode so its failure (for some reason) can reuse the
@no_context label and we can remove @bad_area_nosemaphore.

Reproduce user test for original problem:

------------------------>8-----------------
 #include <stdlib.h>
 #include <stdint.h>

 int main(int argc, char *argv[])
 {
 	volatile uint32_t temp;

 	temp = *(uint32_t *)(0x70000000);
 }
------------------------>8-----------------

Cc: <stable@vger.kernel.org>
Change-Id: I871721a1a4edcdad0384e24f75ec67141c6f2ddc
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:08:28 +05:30
Vineet Gupta
7336c047d2 ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault
do_page_fault() forgot to relinquish mmap_sem if a signal came while
handling handle_mm_fault() - due to say a ctl+c or oom etc.
This would later cause a deadlock by acquiring it twice.

This came to light when running libc testsuite tst-tls3-malloc test but
is likely also the cause for prior seen LTP failures. Using lockdep
clearly showed what the issue was.

| # while true; do ./tst-tls3-malloc ; done
| Didn't expect signal from child: got `Segmentation fault'
| ^C
| ============================================
| WARNING: possible recursive locking detected
| 4.17.0+ #25 Not tainted
| --------------------------------------------
| tst-tls3-malloc/510 is trying to acquire lock:
| 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
|
|but task is already holding lock:
|606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
|
| other info that might help us debug this:
|  Possible unsafe locking scenario:
|
|       CPU0
|       ----
|  lock(&mm->mmap_sem);
|  lock(&mm->mmap_sem);
|
| *** DEADLOCK ***
|

------------------------------------------------------------
What the change does is not obvious (note to myself)

prior code was

| do_page_fault
|
|   down_read()		<-- lock taken
|   handle_mm_fault	<-- signal pending as this runs
|   if fatal_signal_pending
|       if VM_FAULT_ERROR
|           up_read
|       if user_mode
|          return	<-- lock still held, this was the BUG

New code

| do_page_fault
|
|   down_read()		<-- lock taken
|   handle_mm_fault	<-- signal pending as this runs
|   if fatal_signal_pending
|       if VM_FAULT_RETRY
|          return       <-- not same case as above, but still OK since
|                           core mm already relinq lock for FAULT_RETRY
|    ...
|
|   < Now falls through for bug case above >
|
|   up_read()		<-- lock relinquished

Cc: stable@vger.kernel.org
Change-Id: I4d8c5ad338f86f349b3194c4c077ff03b45ae350
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2025-09-05 07:08:18 +05:30
Davidlohr Bueso
57d9feb474 arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net
Change-Id: Id70c1f130d2e66a8edb1ace7b6896a758eab7318
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 07:08:02 +05:30
Peter Xu
42b3d4e267 UPSTREAM: mm/gup: fix __get_user_pages() on fault retry of hugetlb
When follow_hugetlb_page() returns with *locked==0, it means we've got a
VM_FAULT_RETRY within the fauling process and we've released the mmap_sem.
When that happens, we should stop and bail out.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-3-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ad415db817964e96df824e8bb1a861527f8012b6)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ie7482106a07d4ee84b608f99192151d221605efb
2025-09-05 07:07:51 +05:30
Peter Xu
ec841d0894 UPSTREAM: mm/gup: rename "nonblocking" to "locked" where proper
Patch series "mm: Page fault enhancements", v6.

This series contains cleanups and enhancements to current page fault
logic.  The whole idea comes from the discussion between Andrea and Linus
on the bug reported by syzbot here:

  https://lkml.org/lkml/2017/11/2/833

Basically it does two things:

  (a) Allows the page fault logic to be more interactive on not only
      SIGKILL, but also the rest of userspace signals, and,

  (b) Allows the page fault retry (VM_FAULT_RETRY) to happen for more
      than once.

For (a): with the changes we should be able to react faster when page
faults are working in parallel with userspace signals like SIGSTOP and
SIGCONT (and more), and with that we can remove the buggy part in
userfaultfd and benefit the whole page fault mechanism on faster signal
processing to reach the userspace.

For (b), we should be able to allow the page fault handler to loop for
even more than twice.  Some context: for now since we have
FAULT_FLAG_ALLOW_RETRY we can allow to retry the page fault once with the
same interrupt context, however never more than twice.  This can be not
only a potential cleanup to remove this assumption since AFAIU the code
itself doesn't really have this twice-only limitation (though that should
be a protective approach in the past), at the same time it'll greatly
simplify future works like userfaultfd write-protect where it's possible
to retry for more than twice (please have a look at [1] below for a
possible user that might require the page fault to be handled for a third
time; if we can remove the retry limitation we can simply drop that patch
and those complexity).

This patch (of 16):

There's plenty of places around __get_user_pages() that has a parameter
"nonblocking" which does not really mean that "it won't block" (because it
can really block) but instead it shows whether the mmap_sem is released by
up_read() during the page fault handling mostly when VM_FAULT_RETRY is
returned.

We have the correct naming in e.g.  get_user_pages_locked() or
get_user_pages_remote() as "locked", however there're still many places
that are using the "nonblocking" as name.

Renaming the places to "locked" where proper to better suite the
functionality of the variable.  While at it, fixing up some of the
comments accordingly.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 4f6da93411806db2f3e58193b31b95e8c6737616)

Bug: 176847924
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I59a8b6e0977d689f6bd375e75785c80c75b2806b
2025-09-05 07:00:45 +05:30
Andrea Arcangeli
05b8a9090a mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
hugetlb needs the same fix as faultin_nopage (which was applied in
commit 96312e61282a ("mm/gup.c: teach get_user_pages_unlocked to handle
FOLL_NOWAIT")) or KVM hangs because it thinks the mmap_sem was already
released by hugetlb_fault() if it returned VM_FAULT_RETRY, but it wasn't
in the FOLL_NOWAIT case.

Link: http://lkml.kernel.org/r/20190109020203.26669-2-aarcange@redhat.com
Fixes: ce53053ce378 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Change-Id: I538201d28c334b9d4c92c921039de590fa243e98
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:58:41 +05:30
Andrea Arcangeli
caac1a90db mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
KVM is hanging during postcopy live migration with userfaultfd because
get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.

Earlier FOLL_NOWAIT was only ever passed to get_user_pages.

Specifically faultin_page (the callee of get_user_pages_unlocked caller)
doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the page fault
flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't actually
released (even if nonblocking is not NULL).  So it sets *nonblocking to
zero and the caller won't release the mmap_sem thinking it was already
released, but it wasn't because of FOLL_NOWAIT.

Link: http://lkml.kernel.org/r/20180302174343.5421-2-aarcange@redhat.com
Fixes: ce53053ce378c ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Change-Id: I01e104ddabc3b80bb88ac9fbd1254a95fc26e434
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:58:31 +05:30
John Hubbard
055a5c331d mm/gup: finish consolidating error handling
Commit df06b37ffe5a ("mm/gup: cache dev_pagemap while pinning pages")
attempted to operate on each page that get_user_pages had retrieved.  In
order to do that, it created a common exit point from the routine.
However, one case was missed, which this patch fixes up.

Also, there was still an unnecessary shadow declaration (with a
different type) of the "ret" variable, which this patch removes.

Keith's description of the situation is:

  This also fixes a potentially leaked dev_pagemap reference count if a
  failure occurs when an iteration crosses a vma boundary.  I don't think
  it's normal to have different vma's on a users mapped zone device
  memory, but good to fix anyway.

I actually thought that this code:

    /* first iteration or cross vma bound */
    if (!vma || start >= vma->vm_end) {
	        vma = find_extend_vma(mm, start);
	        if (!vma && in_gate_area(mm, start)) {
		            ret = get_gate_page(mm, start & PAGE_MASK,
		                    gup_flags, &vma,
		                    pages ? &pages[i] : NULL);
		            if (ret)
		                goto out;

dealt with the "you're trying to pin the gate page, as part of this
call", rather than the generic case of crossing a vma boundary.  (I
think there's a fine point that I must be overlooking.) But it's still a
valid case, either way.

Link: http://lkml.kernel.org/r/20181121081402.29641-2-jhubbard@nvidia.com
Fixes: df06b37ffe5a4 ("mm/gup: cache dev_pagemap while pinning pages")
Change-Id: Ibbe3c528828c3385ab7af4e3d2b0b6c2690ac022
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:58:22 +05:30
Keith Busch
72707a0749 mm/gup: cache dev_pagemap while pinning pages
Getting pages from ZONE_DEVICE memory needs to check the backing device's
live-ness, which is tracked in the device's dev_pagemap metadata.  This
metadata is stored in a radix tree and looking it up adds measurable
software overhead.

This patch avoids repeating this relatively costly operation when
dev_pagemap is used by caching the last dev_pagemap while getting user
pages.  The gup_benchmark kernel self test reports this reduces time to
get user pages to as low as 1/3 of the previous time.

Link: http://lkml.kernel.org/r/20181012173040.15669-1-keith.busch@intel.com
Change-Id: I10d2fb5078b5791c2c4aae374624c27845b8d47b
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:58:13 +05:30
Gleb Fotengauer-Malinovskiy
fb8feab497 BACKPORT: FROMGIT: userfaultfd: fix UFFDIO_CONTINUE ioctl request definition
This ioctl request reads from uffdio_continue structure which justifies
_IOC_READ flag.  See NOTEs in include/uapi/asm-generic/ioctl.h for more
information.

Link: https://lkml.kernel.org/r/20210601143432.1002481-1-glebfm@altlinux.org
Link: https://lkml.kernel.org/r/20210531140146.481553-1-glebfm@altlinux.org
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 3fe04bff515162ae8192d7eae77b9a92ed1aa945
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1439728/

Conflicts: include/uapi/linux/userfaultfd.h
(Manual rebase: removed definition of UFFDIO_WRITEPROTECT as it's not
implementation on this kernel)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I85d83958a99e5c68e1118b6c189c78108e47be6c
2025-09-05 06:58:03 +05:30
Axel Rasmussen
f754d1b1c5 FROMGIT: userfaultfd/selftests: exercise minor fault handling shmem support
Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test
slightly to pass in / check for the right feature flags.

Link: https://lkml.kernel.org/r/20210503180737.2487560-11-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit b0287e43fb420ab3c5631b146d99b2a1fc9a14d7
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420977/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ic6a6aa2ea92bc6353f60d8f982c2aef78e4ce323
2025-09-05 06:57:48 +05:30
Axel Rasmussen
8a49029f76 BACKPORT: FROMGIT: userfaultfd/selftests: reinitialize test context in each test
Currently, the context (fds, mmap-ed areas, etc.) are global.  Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas).  We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile.  It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of each
test case, so whatever prior test cases did doesn't affect future tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was relying
on.  This wasn't a problem for hugetlb, as we don't mremap in that case.

Link: https://lkml.kernel.org/r/20210503180737.2487560-10-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 72cfac82ddce1d6ac0ec3e1e43c6bafcf98eb0c6
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420974/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I22969f388ef291d48c56a817ce0e9d2f4f1191f7
2025-09-05 06:57:40 +05:30
Axel Rasmussen
7bfa72fbb6 FROMGIT: userfaultfd/selftests: create alias mappings in the shmem test
Previously, we just allocated two shm areas: area_src and area_dst.  With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs.  In a future commit in this
series, we'll leverage this setup to exercise minor fault handling support
for shmem, just like we do in the hugetlb_shared test.

Link: https://lkml.kernel.org/r/20210503180737.2487560-9-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 9c9db903230e25c3b5547a719a883f9ba2970502
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420975/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ic603af85181acc60ecf4c8767c86bf8e2bb5c72c
2025-09-05 06:57:31 +05:30
Axel Rasmussen
aa49004352 FROMGIT: userfaultfd/selftests: use memfd_create for shmem test type
This is a preparatory commit.  In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do in
the hugetlb_shared test.  With a VMA obtained via mmap(MAP_ANONYMOUS |
MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings.  Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since it's
more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup fails
in main(), pass in the right argv[] so we actually print out the hugetlb
file path.

Link: https://lkml.kernel.org/r/20210503180737.2487560-8-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 1861e1c4ef590b7f3f8318da5f681efeb2686449
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420976/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Id8fc463cef73e8e99190f0d748e4d92ada8541a4
2025-09-05 06:57:21 +05:30
Peter Xu
a99d2274eb BACKPORT: FROMGIT: userfaultfd/selftests: unify error handling
Introduce err()/_err() and replace all the different ways to fail the
program, mostly "fprintf" and "perror" with tons of exit() calls.  Always
stop the test program at any failure.

Link: https://lkml.kernel.org/r/20210412232753.1012412-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit fcd7d008016c9bc9bf134debfd077d59bba119d8
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1412454/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ic39dd286c3b1938752c3dc8c060a15b3d5db6564
2025-09-05 06:56:58 +05:30
Peter Xu
b0f07d9b98 userfaultfd: selftest: cleanup help messages
Firstly, the help in the comment region is obsolete, now we support
three parameters.  Since at it, change it and move it into the help
message of the program.

Also, the help messages dumped here and there is obsolete too.  Use a
single usage() helper.

Link: http://lkml.kernel.org/r/20180930074259.18229-2-peterx@redhat.com
Change-Id: I8494d65ea60ed34ba76515c6e9b6557b25eef4d6
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@fb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:55:53 +05:30
Axel Rasmussen
8191ee8c58 BACKPORT: FROMGIT: userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte()
In a previous commit, we added the mfill_atomic_install_pte() helper.
This helper does the job of setting up PTEs for an existing page, to map
it into a given VMA.  It deals with both the anon and shmem cases, as well
as the shared and private cases.

In other words, shmem_mfill_atomic_pte() duplicates a case it already
handles.  So, expose it, and let shmem_mfill_atomic_pte() use it directly,
to reduce code duplication.

This requires that we refactor shmem_mfill_atomic_pte() a bit:

Instead of doing accounting (shmem_recalc_inode() et al) part-way through
the PTE setup, do it afterward.  This frees up mfill_atomic_install_pte()
from having to care about this accounting, and means we don't need to e.g.
shmem_uncharge() in the error path.

A side effect is this switches shmem_mfill_atomic_pte() to use
lru_cache_add_inactive_or_unevictable() instead of just lru_cache_add().
This wrapper does some extra accounting in an exceptional case, if
appropriate, so it's actually the more correct thing to use.

Link: https://lkml.kernel.org/r/20210503180737.2487560-7-axelrasmussen@google.com
Change-Id: Ib339a4e5d2aee6395f3fb4855b0bc5032c16f8e0
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:42:20 +05:30
Axel Rasmussen
fd66739a28 BACKPORT: FROMGIT: userfaultfd/shmem: advertise shmem minor fault support
Now that the feature is fully implemented (the faulting path hooks exist
so userspace is notified, and the ioctl to resolve such faults is
available), advertise this as a supported feature.

Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 37aa962fe33a562fd0ac21b68938023e12041fc3
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420971/

Conflicts: Documentation/admin-guide/mm/userfaultfd.rst
(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I5fbab3783ff8671c0a5aa4826aead2d63f5cbbf3
2025-09-05 06:41:23 +05:30
Axel Rasmussen
2f927bbf66 BACKPORT: FROMGIT: userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl.  The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install a PTE for it.

This commit introduces a new helper: mfill_atomic_install_pte.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c?  The existing userfault implementation only relies on shmem.c for
VM_SHARED VMAs.  However, minor fault handling / CONTINUE work just fine
for !VM_SHARED VMAs as well.  We'd prefer to handle CONTINUE for shmem in
one place, regardless of shared/private (to reduce code duplication).

Why add a new mfill_atomic_install_pte helper?  A problem we have with
continue is that shmem_mfill_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly.  We do want to setup the PTEs in
a CONTINUE operation, but we don't want to e.g.  allocate a new page,
charge it (e.g.  to the shmem inode), manipulate various flags, etc.  Also
we have the problem stated above: shmem_mfill_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared / private)
continue cares about.  So, introduce mcontinue_atomic_pte(), to handle all
of the shmem continue cases.  Introduce the helper so it doesn't duplicate
code with mcopy_atomic_pte().

In a future commit, shmem_mfill_atomic_pte() will also be modified to use
this new helper.  However, since this is a bigger refactor, it seems most
clear to do it as a separate change.

Link: https://lkml.kernel.org/r/20210503180737.2487560-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit c9a4579a9f5320ff062f973476473242b551bacd
https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1420972/

Conflicts: mm/userfaultfd.c
(1. Removed all 'wp_copy' usage as write-protect uffd feature doesn't exist
in this kernel.
2. Due to lack of mem_cgroup_charge() in this kernel, worked with
mem_cgroup_try_charge() instead in mcopy_atomic_pte()
3. Replaced lru_cache_add_inactive_or_unevictable() with
lru_cache_add_active_or_unevictable())

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I46eb835849e7798a9d23cf53959bd93655b926d4
2025-09-05 06:41:12 +05:30
Axel Rasmussen
39fc7368a9 BACKPORT: FROMGIT: userfaultfd/shmem: support minor fault registration for shmem
This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path, for
VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve such
faults.

Because of this, we also don't yet advertise this as a supported feature.
That will be done in a separate commit when the feature is fully
implemented.

Link: https://lkml.kernel.org/r/20210503180737.2487560-4-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 6867a29320b7d178feb9786856e5ea2cf40f6d33
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1420970/

Conflicts: mm/shmem.c
(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ib8e3fe202feab7404742814f4250a0981065368c
2025-09-05 06:41:03 +05:30
Axel Rasmussen
8453b59de1 BACKPORT: FROMGIT: userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
Patch series "userfaultfd: add minor fault handling for shmem", v6.

Overview
========

See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general.  This series adds the same
support for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
  fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
  helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs.  So, this feature will be used to support the same VM live
migration use case described in my original series.

Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope
to optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap.
With this feature, the heap can be shared-mapped at another location where
the GC-thread(s) could continue the compaction operation without the need
to invoke userfault ioctl(UFFDIO_COPY) each time.  OTOH, if and when Java
threads get faults on the heap, UFFDIO_CONTINUE can be used to resume
execution.  Furthermore, this feature enables updating references in the
'non-moving' portion of the heap efficiently.  Without this feature,
uneccessary page copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t

This patch (of 9):

Previously, we did a dance where we had one calling path in userfaultfd.c
(mfill_atomic_pte), but then we split it into two in shmem_fs.h
(shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined into a single
shared function in shmem.c (shmem_mfill_atomic_pte).

This is all a bit overly complex.  Just call the single combined shmem
function directly, allowing us to clean up various branches, boilerplate,
etc.

While we're touching this function, two other small cleanup changes:
- offset is equivalent to pgoff, so we can get rid of offset entirely.
- Split two VM_BUG_ON cases into two statements. This means the line
  number reported when the BUG is hit specifies exactly which condition
  was true.

Link: https://lkml.kernel.org/r/20210503180737.2487560-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210503180737.2487560-3-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit f7e89f242f0dfcdd62e7aeecebdc2620e4792954
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1420969/

Conflicts:
1. include/linux/shmem_fs.h
2. mm/shmem.c
3. mm/userfaultfd.c
(All resolved by manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I8b443809643339b407d0bc06bf96146ecfc5a9fa
2025-09-05 06:40:42 +05:30
Peter Xu
d742e0e0f3 BACKPORT: FROMGIT: userfaultfd/selftests: only dump counts if mode enabled
WP and MINOR modes are conditionally enabled on specific memory types.
This patch avoids dumping tons of zeros for those cases when the modes are
not supported at all.

Link: https://lkml.kernel.org/r/20210412232753.1012412-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 5edabbbe74eee6c1c59dafbbfb5528e391be447c
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
akpm)
Link: https://lore.kernel.org/patchwork/patch/1412453/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(1. Manual rebase
2. Removed 'wp_total' related change in uffd_stats_report() as
write-protect test doesn't exist.)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I445f883f3402104ea39d50ea60726b16394f6e50
2025-09-05 06:40:32 +05:30
Axel Rasmussen
049e3240d6 BACKPORT: userfaultfd/selftests: add test exercising minor fault handling
Fix a dormant bug in userfaultfd_events_test(), where we did `return
faulting_process(0)` instead of `exit(faulting_process(0))`.  This
caused the forked process to keep running, trying to execute any further
test cases after the events test in parallel with the "real" process.

Add a simple test case which exercises minor faults.  In short, it does
the following:

1. "Sets up" an area (area_dst) and a second shared mapping to the same
   underlying pages (area_dst_alias).

2. Register one of these areas with userfaultfd, in minor fault mode.

3. Start a second thread to handle any minor faults.

4. Populate the underlying pages with the non-UFFD-registered side of
   the mapping. Basically, memset() each page with some arbitrary
   contents.

5. Then, using the UFFD-registered mapping, read all of the page
   contents, asserting that the contents match expectations (we expect
   the minor fault handling thread can modify the page contents before
   resolving the fault).

The minor fault handling thread, upon receiving an event, flips all the
bits (~) in that page, just to prove that it can modify it in some
arbitrary way.  Then it issues a UFFDIO_CONTINUE ioctl, to setup the
mapping and resolve the fault.  The reading thread should wake up and
see this modification.

Currently the minor fault test is only enabled in hugetlb_shared mode,
as this is the only configuration the kernel feature supports.

Link: https://lkml.kernel.org/r/20210301222728.176417-7-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f0fa94330919be8ec5620382b50f1c72844c9224)

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I85a490f0c4d0a1f462ae01231c3853d6ffc23d2b
2025-09-05 06:40:22 +05:30
Lokesh Gidra
e231dc299a BACKPORT: userfaultfd: selftests: add write-protect test
Add uffd tests for write protection.

Instead of introducing new tests for it, let's simply squashing uffd-wp
tests into existing uffd-missing test cases.  Changes are:

(1) Bouncing tests

  We do the write-protection in two ways during the bouncing test:

  - By using UFFDIO_COPY_MODE_WP when resolving MISSING pages: then
    we'll make sure for each bounce process every single page will be
    at least fault twice: once for MISSING, once for WP.

  - By direct call UFFDIO_WRITEPROTECT on existing faulted memories:
    To further torture the explicit page protection procedures of
    uffd-wp, we split each bounce procedure into two halves (in the
    background thread): the first half will be MISSING+WP for each
    page as explained above.  After the first half, we write protect
    the faulted region in the background thread to make sure at least
    half of the pages will be write protected again which is the first
    half to test the new UFFDIO_WRITEPROTECT call.  Then we continue
    with the 2nd half, which will contain both MISSING and WP faulting
    tests for the 2nd half and WP-only faults from the 1st half.

(2) Event/Signal test

  Mostly previous tests but will do MISSING+WP for each page.  For
  sigbus-mode test we'll need to provide standalone path to handle the
  write protection faults.

For all tests, do statistics as well for uffd-wp pages.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-20-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 9b12488a7711b9aa2d0915f6949a8ad2069eb072)

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Removed write-protect test related changes)

Note: This patch introduces a write-protect test. In addition it also
introduces uffd_stats_report(), which prints the stats in uniform a
manner. We only require uffd_stats_report() as subsequent patches depend
on it. Write-protect test is omitted as this kernel doesn't have
userfaultfd's write-protect feature.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Iad7012e086016e4a67a283c4997d8f7be92eea74
2025-09-05 06:40:07 +05:30
Peter Xu
46b82f1e55 BACKPORT: userfaultfd: selftests: refactor statistics
Introduce uffd_stats structure for statistics of the self test, at the
same time refactor the code to always pass in the uffd_stats for either
read() or poll() typed fault handling threads instead of using two
different ways to return the statistic results.  No functional change.

With the new structure, it's very easy to introduce new statistics.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-19-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c8aed6c1b95c3c6de68bd2814611d5d54da5057)

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I93597413653e9d55954e7912db56794e4ae446be
2025-09-05 06:39:57 +05:30
Lokesh Gidra
fb39888e17 Revert "BACKPORT: FROMGIT: userfaultfd/selftests: add test exercising minor fault handling"
This reverts commit 2b2f7d6a04d65c2ebd53f9e5d3de6b55635d57b6 so that it
can be re-applied once dependcies are brought in.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ib24c94e1d8ec6c46004d35f58e48d9d35eeb8312
2025-09-05 06:39:31 +05:30
Lokesh Gidra
88bab4a951 Revert "BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem"
This reverts commit 0309b3f479b967acb644f99d214e2b25297a20b1
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I765fe86a2dc0305482a0590c14143dee27840b8a
2025-09-05 06:39:19 +05:30
Lokesh Gidra
686bc68bdf Revert "FROMGIT: userfaultfd/selftests: use memfd_create for shmem test type"
This reverts commit f8bed3c813cb21c9c576d42068994e0199773f96
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I808f6c58429c83f03eae2ad2cbe65a232b2505c8
2025-09-05 06:39:08 +05:30
Lokesh Gidra
23d629ffd1 Revert "FROMGIT: userfaultfd/selftests: create alias mappings in the shmem test"
This reverts commit 08dec4889aede52e2f09ae2333c8a292aa12d359
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I13e7cac79661ef46882cc69ab60327fafd212093
2025-09-05 06:38:50 +05:30
Lokesh Gidra
9ad709f759 Revert "BACKPORT: FROMGIT: userfaultfd/selftests: reinitialize test context in each test"
This reverts commit 2ed6377e515707a52d123b735d861a7b8305eceb
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: I239e3f80e27d7e86dc9911cc3dd5640fc2d9bf94
2025-09-05 06:38:41 +05:30
Lokesh Gidra
2609637677 Revert "BACKPORT: FROMGIT: userfaultfd/selftests: exercise minor fault handling shmem support"
This reverts commit 44f2dcd54e168d03c32ef4730596783bff3180f7.
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug 187930641
Change-Id: Iec5f001d559609b4b3a2239a2b92cd46778437ca
2025-09-05 06:38:30 +05:30
Lokesh Gidra
7680ead7bb Revert "FROMLIST: userfaultfd/shmem: fix minor fault page leak"
This reverts commit 33a50fd21ddb04629ba4a262f3cdcdc2f43ee578
as an updated version of the patch-set will be merged later.

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 187930641
Change-Id: Ie6fe2a611d4cbb3bd103c62a90b84e6ba4e89af1
2025-09-05 06:38:20 +05:30
Shaohua Li
33496d79c8 BACKPORT: userfaultfd: wp: add helper for writeprotect check
Patch series "userfaultfd: write protection support", v6.

Overview
========

The uffd-wp work was initialized by Shaohua Li [1], and later continued by
Andrea [2].  This series is based upon Andrea's latest userfaultfd tree,
and it is a continuous works from both Shaohua and Andrea.  Many of the
follow up ideas come from Andrea too.

Besides the old MISSING register mode of userfaultfd, the new uffd-wp
support provides another alternative register mode called
UFFDIO_REGISTER_MODE_WP that can be used to listen to not only missing
page faults but also write protection page faults, or even they can be
registered together.  At the same time, the new feature also provides a
new userfaultfd ioctl called UFFDIO_WRITEPROTECT which allows the
userspace to write protect a range or memory or fixup write permission of
faulted pages.

Please refer to the document patch "userfaultfd: wp:
UFFDIO_REGISTER_MODE_WP documentation update" for more information on the
new interface and what it can do.

The major workflow of an uffd-wp program should be:

  1. Register a memory region with WP mode using UFFDIO_REGISTER_MODE_WP

  2. Write protect part of the whole registered region using
     UFFDIO_WRITEPROTECT, passing in UFFDIO_WRITEPROTECT_MODE_WP to
     show that we want to write protect the range.

  3. Start a working thread that modifies the protected pages,
     meanwhile listening to UFFD messages.

  4. When a write is detected upon the protected range, page fault
     happens, a UFFD message will be generated and reported to the
     page fault handling thread

  5. The page fault handler thread resolves the page fault using the
     new UFFDIO_WRITEPROTECT ioctl, but this time passing in
     !UFFDIO_WRITEPROTECT_MODE_WP instead showing that we want to
     recover the write permission.  Before this operation, the fault
     handler thread can do anything it wants, e.g., dumps the page to
     a persistent storage.

  6. The worker thread will continue running with the correctly
     applied write permission from step 5.

Currently there are already two projects that are based on this new
userfaultfd feature.

QEMU Live Snapshot: The project provides a way to allow the QEMU
                    hypervisor to take snapshot of VMs without
                    stopping the VM [3].

LLNL umap library:  The project provides a mmap-like interface and
                    "allow to have an application specific buffer of
                    pages cached from a large file, i.e. out-of-core
                    execution using memory map" [4][5].

Before posting the patchset, this series was smoke tested against QEMU
live snapshot and the LLNL umap library (by doing parallel quicksort using
128 sorting threads + 80 uffd servicing threads).  My sincere thanks to
Marty Mcfadden and Denis Plotnikov for the help along the way.

TODO
====

- hugetlbfs/shmem support
- performance
- more architectures
- cooperate with mprotect()-allowed processes (???)
- ...

References
==========

[1] https://lwn.net/Articles/666187/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/log/?h=userfault
[3] https://github.com/denis-plotnikov/qemu/commits/background-snapshot-kvm
[4] https://github.com/LLNL/umap
[5] https://llnl-umap.readthedocs.io/en/develop/
[6] https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=b245ecf6cf59156966f3da6e6b674f6695a5ffa5
[7] https://lkml.org/lkml/2018/11/21/370
[8] https://lkml.org/lkml/2018/12/30/64

This patch (of 19):

Add helper for writeprotect check. Will use it later.

Bug: 254441685
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220163112.11409-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1df319e0b4dee11436fe2ab1a0d536d3fad7cfef)
[Lee: Dependency for Fixes: commit 6ce64428d6202 ("mm/userfaultfd: fix memory corruption due to writeprotect"))
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0c09235bb2e3ac9abd7aa5d5ba1cdba19d2afbb7
2025-09-05 06:38:12 +05:30
Lokesh Gidra
1b4f9ba57f BACKPORT: ANDROID: userfaultfd: abort uffdio ops if mmap_lock is contended
Check if the mmap_lock is contended when looping over the pages that
are requested to be filled. When it is observed, we rely on the already
existing mechanism to return bytes copied/filled and -EAGAIN as error.

This helps by avoiding contention of mmap_lock for long running
userfaultfd operations. The userspace can perform other tasks before
retrying the operation for the remaining pages.

Bug: 320478828
Change-Id: I6d485fd03c96a826956ee3962e58058be3cf81c1
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 06:36:03 +05:30
Lokesh Gidra
fa49c6bf49 BACKPORT: ANDROID: userfaultfd: add MMAP_TRYLOCK mode for COPY/ZEROPAGE
In case mmap_lock is contended, it is possible that userspace can spend
time performing other tasks rather than waiting in uninterruptible-sleep
state for the lock to become available. Even if no other task is
available, it is better to yield or sleep rather than adding contention
to already contended lock.

We introduce MMAP_TRYLOCK mode so that when possible, userspace can
request to use mmap_read_trylock(), returning -EAGAIN if and when it
fails.

Bug: 320478828
Change-Id: I2d196fd317e054af03dbd35ac1b0c7634cb370dc
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 06:35:51 +05:30
Andrea Arcangeli
2ae6243333 BACKPORT: userfaultfd: wp: add UFFDIO_COPY_MODE_WP
This allows UFFDIO_COPY to map pages write-protected.

[peterx@redhat.com: switch to VM_WARN_ON_ONCE in mfill_atomic_pte; add brackets
 around "dst_vma->vm_flags & VM_WRITE"; fix wordings in comments and
 commit messages]
Change-Id: I2d60bc1b44670d45d5b363e2951f0f530486a7bb
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-6-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 06:35:37 +05:30
Peter Xu
924deb848d BACKPORT: userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally
Only declare _UFFDIO_WRITEPROTECT if the user specified
UFFDIO_REGISTER_MODE_WP and if all the checks passed.  Then when the user
registers regions with shmem/hugetlbfs we won't expose the new ioctl to
them.  Even with complete anonymous memory range, we'll only expose the
new WP ioctl bit if the register mode has MODE_WP.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 14819305e09fe4fda546f0dfa12134c8e5366616)

[ Kalesh Singh - resolve conflicts in fs/userfaultfd.c ]

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: kernel test robot <lkp@intel.com> [1]
[1] https://lore.kernel.org/r/202201170247.Cir3moOM-lkp@intel.com/

Bug: 160737021
Bug: 169683130
Change-Id: I4f205e642f5f0e5824a43303aab30626cce3ddcb
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 06:35:28 +05:30
Peter Collingbourne
816209b47e userfaultfd: do not untag user pointers
commit e71e2ace5721a8b921dca18b045069e7bb411277 upstream.

Patch series "userfaultfd: do not untag user pointers", v5.

If a user program uses userfaultfd on ranges of heap memory, it may end
up passing a tagged pointer to the kernel in the range.start field of
the UFFDIO_REGISTER ioctl.  This can happen when using an MTE-capable
allocator, or on Android if using the Tagged Pointers feature for MTE
readiness [1].

When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field of struct
uffd_msg.  However, from the application's perspective, the tagged
address *is* the memory address, so if the application is unaware of
memory tags, it may get confused by receiving an address that is, from
its point of view, outside of the bounds of the allocation.  We observed
this behavior in the kselftest for userfaultfd [2] but other
applications could have the same problem.

Address this by not untagging pointers passed to the userfaultfd ioctls.
Instead, let the system call fail.  Also change the kselftest to use
mmap so that it doesn't encounter this problem.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

This patch (of 2):

Do not untag pointers passed to the userfaultfd ioctls.  Instead, let
the system call fail.  This will provide an early indication of problems
with tag-unaware userspace code instead of letting the code get confused
later, and is consistent with how we decided to handle brk/mmap/mremap
in commit dcde237319e6 ("mm: Avoid creating virtual address aliases in
brk()/mmap()/mremap()"), as well as being consistent with the existing
tagged address ABI documentation relating to how ioctl arguments are
handled.

The code change is a revert of commit 7d0325749a6c ("userfaultfd: untag
user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.

[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c

Link: https://lkml.kernel.org/r/20210714195437.118982-1-pcc@google.com
Link: https://lkml.kernel.org/r/20210714195437.118982-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0a25501b
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Change-Id: I2e764cbb1867cd8018d177d449a05cf4d0d51576
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alistair Delva <adelva@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mitch Phillips <mitchp@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William McVicker <willmcvicker@google.com>
Cc: <stable@vger.kernel.org>	[5.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-05 06:35:19 +05:30
Dmitry Safonov
c3eb0457e3 UPSTREAM: mm/mremap: don't account pages in vma_to_resize()
All this vm_unacct_memory(charged) dance seems to complicate the life
without a good reason.  Furthermore, it seems not always done right on
error-pathes in mremap_to().  And worse than that: this `charged'
difference is sometimes double-accounted for growing MREMAP_DONTUNMAP
mremap()s in move_vma():

	if (security_vm_enough_memory_mm(mm, new_len >> PAGE_SHIFT))

Let's not do this.  Account memory in mremap() fast-path for growing
VMAs or in move_vma() for actually moving things.  The same simpler way
as it's done by vm_stat_account(), but with a difference to call
security_vm_enough_memory_mm() before copying/adjusting VMA.

Originally noticed by Chen Wandun:
https://lkml.kernel.org/r/20210717101942.120607-1-chenwandun@huawei.com

Link: https://lkml.kernel.org/r/20210721131320.522061-1-dima@arista.com
Fixes: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Wandun <chenwandun@huawei.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fdbef61491359947753c13581098878e8d038286)
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Id0833f3e84c4a6119e2ec069afff5122cb0c6b35
2025-09-05 06:35:01 +05:30
Brian Geffon
3006263515 BACKPORT: FROMLIST: Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio"
This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99.

As discussed in [1] this commit was a no-op because the mapping type was
checked in vma_to_resize before move_vma is ever called. This meant that
vm_ops->mremap() would never be called on such mappings. Furthermore,
we've since expanded support of MREMAP_DONTUNMAP to non-anonymous
mappings, and these special mappings are still protected by the existing
check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL.

1. https://lkml.org/lkml/2020/12/28/2340

Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1401226/
Conflicts: include/linux/mm.h
(Resolved minor conflict with manual rebase)
Bug: 160737021
Bug: 169683130
Change-Id: I97d29e6a54cee07ba69d6bb880778ee1fea8ff7c
2025-09-05 06:34:52 +05:30
Brian Geffon
ac250ca555 FROMLIST: mm: Extend MREMAP_DONTUNMAP to non-anonymous mappings
Currently MREMAP_DONTUNMAP only accepts private anonymous mappings.
This restriction was placed initially for simplicity and not because
there exists a technical reason to do so.

This change will widen the support to include any mappings which are not
VM_DONTEXPAND or VM_PFNMAP. The primary use case is to support
MREMAP_DONTUNMAP on mappings which may have been created from a memfd.
This change will result in mremap(MREMAP_DONTUNMAP) returning -EINVAL
if VM_DONTEXPAND or VM_PFNMAP mappings are specified.

Lokesh Gidra who works on the Android JVM, provided an explanation of how
such a feature will improve Android JVM garbage collection:
"Android is developing a new garbage collector (GC), based on userfaultfd.
The garbage collector will use userfaultfd (uffd) on the java heap during
compaction. On accessing any uncompacted page, the application threads will
find it missing, at which point the thread will create the compacted page
and then use UFFDIO_COPY ioctl to get it mapped and then resume execution.
Before starting this compaction, in a stop-the-world pause the heap will be
mremap(MREMAP_DONTUNMAP) so that the java heap is ready to receive
UFFD_EVENT_PAGEFAULT events after resuming execution.

To speedup mremap operations, pagetable movement was optimized by moving
PUD entries instead of PTE entries [1]. It was necessary as mremap of even
modest sized memory ranges also took several milliseconds, and stopping the
application for that long isn't acceptable in response-time sensitive
cases.

With UFFDIO_CONTINUE feature [2], it will be even more efficient to
implement this GC, particularly the 'non-moveable' portions of the heap.
It will also help in reducing the need to copy (UFFDIO_COPY) the pages.
However, for this to work, the java heap has to be on a 'shared' vma.
Currently MREMAP_DONTUNMAP only supports private anonymous mappings, this
patch will enable using UFFDIO_CONTINUE for the new userfaultfd-based heap
compaction."

[1] https://lore.kernel.org/linux-mm/20201215030730.NC3CU98e4%25akpm@linux-foundation.org/
[2] https://lore.kernel.org/linux-mm/20210302000133.272579-1-axelrasmussen@google.com/

Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1401224/
Bug: 160737021
Bug: 169683130
Change-Id: Ic4f023dff404d7b0e35adbe92c7a12536aa0f70d
2025-09-05 06:34:43 +05:30
Dmitry Safonov
1b058b5e33 UPSTREAM: mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio
As kernel expect to see only one of such mappings, any further operations
on the VMA-copy may be unexpected by the kernel.  Maybe it's being on the
safe side, but there doesn't seem to be any expected use-case for this, so
restrict it now.

Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit cd544fd1dc9293c6702fab6effa63dac1cc67e99)

Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I9fc569e41bd7fc07c2a168c96d83eda0396c2305
2025-09-05 06:34:32 +05:30
Dmitry Safonov
2b186abc00 BACKPORT: mm/mremap: for MREMAP_DONTUNMAP check security_vm_enough_memory_mm()
Currently memory is accounted post-mremap() with MREMAP_DONTUNMAP, which
may break overcommit policy.  So, check if there's enough memory before
doing actual VMA copy.

Don't unset VM_ACCOUNT on MREMAP_DONTUNMAP.  By semantics, such mremap()
is actually a memory allocation.  That also simplifies the error-path a
little.

Also, as it's memory allocation on success don't reset hiwater_vm value.

Link: https://lkml.kernel.org/r/20201013013416.390574-3-dima@arista.com
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ad8ee77ea9db1f74fe79c285e3546375efa75608)

[Kalesh Singh: Resolve conflicts in mm/mremap.c]
Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If8e8bf5a18d16ed6b40af22a685951698e6f444c
2025-09-05 06:34:22 +05:30
Arnaldo Carvalho de Melo
32b6ea0f21 UPSTREAM: tools headers UAPI: Sync linux/mman.h with the kernel
To get the changes in:

  e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")

Add that to 'perf trace's mremap 'flags' decoder.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
  diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(cherry picked from commit f60b3878f47311a61fe2d4c5ef77c52e31554c52)

Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Iab7117411cf9c4c32356da2b3de598a0df299a7c
2025-09-05 06:34:13 +05:30
Brian Geffon
962757e7da UPSTREAM: userfaultfd: fix remap event with MREMAP_DONTUNMAP
A user is not required to set a new address when using MREMAP_DONTUNMAP
as it can be used without MREMAP_FIXED.  When doing so the remap event
will use new_addr which may not have been set and we didn't propagate it
back other then in the return value of remap_to.

Because ret is always the new address it's probably more correct to use
it rather than new_addr on the remap_event_complete call, and it
resolves this bug.

Fixes: e346b3813067d4b ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Link: http://lkml.kernel.org/r/20200506172158.218366-1-bgeffon@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d1564926066115fb47ca06b2b9d23ed506ca9608)

Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I7a20139ee14a2c9124b74310d7c0324c11009087
2025-09-05 06:34:03 +05:30
Brian Geffon
e2e5898634 UPSTREAM: mm: Fix MREMAP_DONTUNMAP accounting on VMA merge
When remapping a mapping where a portion of a VMA is remapped
into another portion of the VMA it can cause the VMA to become
split. During the copy_vma operation the VMA can actually
be remerged if it's an anonymous VMA whose pages have not yet
been faulted. This isn't normally a problem because at the end
of the remap the original portion is unmapped causing it to
become split again.

However, MREMAP_DONTUNMAP leaves that original portion in place which
means that the VMA which was split and then remerged is not actually
split at the end of the mremap. This patch fixes a bug where
we don't detect that the VMAs got remerged and we end up
putting back VM_ACCOUNT on the next mapping which is completely
unreleated. When that next mapping is unmapped it results in
incorrectly unaccounting for the memory which was never accounted,
and eventually we will underflow on the memory comittment.

There is also another issue which is similar, we're currently
accouting for the number of pages in the new_vma but that's wrong.
We need to account for the length of the remap operation as that's
all that is being added. If there was a mapping already at that
location its comittment would have been adjusted as part of
the munmap at the start of the mremap.

A really simple repro can be seen in:
https://gist.github.com/bgaff/e101ce99da7d9a8c60acc641d07f312c

Fixes: e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit dadbd85f2afc8ccd1dd1f0131781c740c91edd96)

Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I823b63d110825c306dab6929fca9408f8bf79ff0
2025-09-05 06:33:46 +05:30
Brian Geffon
b2dff52a38 UPSTREAM: mm/mremap: add MREMAP_DONTUNMAP to mremap()
When remapping an anonymous, private mapping, if MREMAP_DONTUNMAP is set,
the source mapping will not be removed.  The remap operation will be
performed as it would have been normally by moving over the page tables to
the new mapping.  The old vma will have any locked flags cleared, have no
pagetables, and any userfaultfds that were watching that range will
continue watching it.

For a mapping that is shared or not anonymous, MREMAP_DONTUNMAP will cause
the mremap() call to fail.  Because MREMAP_DONTUNMAP always results in
moving a VMA you MUST use the MREMAP_MAYMOVE flag, it's not possible to
resize a VMA while also moving with MREMAP_DONTUNMAP so old_len must
always be equal to the new_len otherwise it will return -EINVAL.

We hope to use this in Chrome OS where with userfaultfd we could write an
anonymous mapping to disk without having to STOP the process or worry
about VMA permission changes.

This feature also has a use case in Android, Lokesh Gidra has said that
"As part of using userfaultfd for GC, We'll have to move the physical
pages of the java heap to a separate location.  For this purpose mremap
will be used.  Without the MREMAP_DONTUNMAP flag, when I mremap the java
heap, its virtual mapping will be removed as well.  Therefore, we'll
require performing mmap immediately after.  This is not only time
consuming but also opens a time window where a native thread may call mmap
and reserve the java heap's address range for its own usage.  This flag
solves the problem."

[bgeffon@google.com: v6]
  Link: http://lkml.kernel.org/r/20200218173221.237674-1-bgeffon@google.com
[bgeffon@google.com: v7]
  Link: http://lkml.kernel.org/r/20200221174248.244748-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Deacon <will@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Link: http://lkml.kernel.org/r/20200207201856.46070-1-bgeffon@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e346b3813067d4b17383f975f197a9aa28a3b077)

Bug: 176847609
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I8474cd9f032de02fa1764c49c63d368ac15346da
2025-09-05 06:33:37 +05:30
Orson Zhai
e61e8a4960 ANDROID: userfaultfd: Fix untag pointer in userfaultfd_continue()
Fixes e71e2ace5721 ("userfaultfd: do not untag user pointers").

Above patch is ported from upstream into LTS branch then merged into
android12-5.4. The original patch was going to fix all untag user
pointers.

The change to userfaultfd_continue() was cut off when being merged into
LTS because the routine does not exist in LTS.

But specially the routine has been cherry-picked into android12-5.4 with
commit b69f713e60d0 ("BACKPORT: FROMGIT: userfaultfd: add
UFFDIO_CONTINUE ioctl") by Lokesh Gidra <lokeshgidra@google.com> long
ago.

So add back the missing part of fixing here.

Fixes: e71e2ace5721 ("userfaultfd: do not untag user pointers")
Change-Id: I32da80c2e9517356daadf566a433e056f6bef08c
Signed-off-by: Orson Zhai <orson.zhai@unisoc.com>
2025-09-05 06:33:24 +05:30
Axel Rasmussen
e0353f1e3f FROMLIST: userfaultfd/shmem: fix minor fault page leak
This fix is analogous to Peter Xu's fix for hugetlb [0]. If we don't
put_page() after getting the page out of the page cache, we leak the
reference.

The fix can be verified by checking /proc/meminfo and running the
userfaultfd selftest in shmem mode. Without the fix, we see MemFree /
MemAvailable steadily decreasing with each run of the test. With the
fix, memory is correctly freed after the test program exits.

Fixes: 00da60b9d0a0 ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

Link: https://lore.kernel.org/patchwork/patch/1400686/
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I599f1434e24fce6e31d0d73c7f9c4714e9875b63
2025-09-05 06:33:12 +05:30
Peter Xu
55188b93a1 BACKPORT: FROMLIST: userfaultfd/hugetlbfs: Fix minor fault page leak
When uffd-minor enabled, we need to put the page cache before handling the
userfault in hugetlb_no_page(), otherwise the page refcount got leaked.

This can be reproduced by running userfaultfd selftest with hugetlb_shared
mode, then cat /proc/meminfo.

Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: f2bf15fb0969 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>

Link: https://lore.kernel.org/patchwork/patch/1400632/
Conflicts: mm/hugetlb.c
(manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iac0ebd6738af8b6212c5a6303e4ee2f482bb5841
2025-09-05 06:33:02 +05:30
Peter Xu
2f9b62530b BACKPORT: FROMGIT: userfaultfd/selftests: drop VERIFY check in locking_thread
It tries to check against all zeros and looped for quite a few times.
However after that we'll verify the same page with count_verify, while
count_verify can never be zero.  So it means if it's a zero page we'll
detect it anyways with below code.

There's yet another place we conditionally check the fault flag - just do
it unconditionally.

Link: https://lkml.kernel.org/r/20210310004511.51996-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
(cherry picked from commit 9c42d387952986b4159ce304a14771a0534469b3
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1392478/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iee4e4fa10db0b2a01cb4804087275d2714e477b4
2025-09-05 06:32:53 +05:30
Peter Xu
b8399bac97 FROMGIT: userfaultfd/selftests: remove the time() check on delayed uffd
There seems to have no guarantee that time() will return the same for the
two calls even if there's no delay, e.g.  when a fault is accidentally
crossing the changing of a second.  Meanwhile, this message is also not
helping that much since delay could happen with a lot of reasons, e.g.,
schedule latency of resolving thread.  It may not mean an issue with uffd.

Neither do I see this error triggered either in the past runs.  Even if it
triggers, it'll be drown in all the rest of test logs.  Remove it.

Link: https://lkml.kernel.org/r/20210310004511.51996-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 9c08bd6a7410e916a8d38e932d913bb240219745
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1392477/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I09b7b75425115653ee6082e7f1846984fba197fe
2025-09-05 06:32:43 +05:30
Peter Xu
6998091128 FROMGIT: userfaultfd/selftests: use user mode only
Patch series "userfaultfd/selftests: A few cleanups".

I wanted to cleanup userfaultfd.c fault handling for a long time.  If it's
not cleaned, when the new code grows the file it'll also grow the size
that needs to be cleaned...  This is my attempt to cleanup the userfaultfd
selftest on fault handling, to use an err() macro instead of either
fprintf() or perror() then another exit() call.

The huge cleanup is done in the last patch.  The first 4 patches are some
other standalone cleanups for the same file, so I put them together.

This patch (of 5):

Userfaultfd selftest does not need to handle kernel initiated fault.  Set
user mode so it can be run even if unprivileged_userfaultfd=0 (which is
the default).

Link: https://lkml.kernel.org/r/20210310004511.51996-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit eaf221e6a1916d2471309800e4534bdc0698af33
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1392475/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ie3f0708b3c1a25ea66e4e0795e4ea54ca938c7a1
2025-09-05 06:32:33 +05:30
Axel Rasmussen
c4c91e7785 BACKPORT: FROMGIT: userfaultfd/selftests: exercise minor fault handling shmem support
Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test
slightly to pass in / check for the right feature flags.

Link: https://lkml.kernel.org/r/20210302000133.272579-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 01d5af3a0bc027d51d729cefe3105c7054182df7
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388148/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I545c6b7058f59fe5bc21d2dd37202209253f1a92
2025-09-05 06:32:24 +05:30
Axel Rasmussen
1d0791d5cc BACKPORT: FROMGIT: userfaultfd/selftests: reinitialize test context in each test
Currently, the context (fds, mmap-ed areas, etc.) are global.  Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas).  We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile.  It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of each
test case, so whatever prior test cases did doesn't affect future tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was relying
on.  This wasn't a problem for hugetlb, as we don't mremap in that case.

Link: https://lkml.kernel.org/r/20210302000133.272579-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 0108aac75e6d6852e8bba20d5b94e29bf8dc9335
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388145/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Icf57513cad6f6580114bc5452bfadc5e528434ed
2025-09-05 06:30:46 +05:30
Axel Rasmussen
773b92305a FROMGIT: userfaultfd/selftests: create alias mappings in the shmem test
Previously, we just allocated two shm areas: area_src and area_dst.  With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs.  In a future commit in this
series, we'll leverage this setup to exercise minor fault handling support
for shmem, just like we do in the hugetlb_shared test.

Link: https://lkml.kernel.org/r/20210302000133.272579-4-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 8bc5e62208bcb9427ea6eed94ff1b152598da6f8
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388149/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I1605dba2d5f0e35bfd57bd9110bb54e950faab19
2025-09-05 06:30:34 +05:30
Axel Rasmussen
cd4a8c5f46 FROMGIT: userfaultfd/selftests: use memfd_create for shmem test type
This is a preparatory commit.  In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do in
the hugetlb_shared test.  With a VMA obtained via mmap(MAP_ANONYMOUS |
MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings.  Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since it's
more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup fails
in main(), pass in the right argv[] so we actually print out the hugetlb
file path.

Link: https://lkml.kernel.org/r/20210302000133.272579-3-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit de45daecde2d4793e9021b102e168a4cb656dd03
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388147/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I63ac39245d4090e275238efa75bcdbd40fcc7879
2025-09-05 06:30:23 +05:30
Axel Rasmussen
c8a5edb057 BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem
Patch series "userfaultfd: support minor fault handling for shmem", v2.

Overview
========

See my original series [1] for a detailed overview of minor fault handling
in general.  The feature in this series works exactly like the hugetblfs
version (from userspace's perspective).

I'm sending this as a separate series because:

- The original minor fault handling series has a full set of R-Bs, and seems
  close to being merged. So, it seems reasonable to start looking at this next
  step, which extends the basic functionality.

- shmem is different enough that this series may require some additional work
  before it's ready, and I don't want to delay the original series
  unnecessarily by bundling them together.

Use Case
========

In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs.  So, this feature will be used to support the same VM live
migration use case described in my original series.

Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope
to optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap.
With this feature, the heap can be shared-mapped at another location where
the GC-thread(s) could continue the compaction operation without the need
to invoke userfault ioctl(UFFDIO_COPY) each time.  OTOH, if and when Java
threads get faults on the heap, UFFDIO_CONTINUE can be used to resume
execution.  Furthermore, this feature enables updating references in the
'non-moving' portion of the heap efficiently.  Without this feature,
uneccessary page copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t

This patch (of 5):

Modify the userfaultfd register API to allow registering shmem VMAs in
minor mode.  Modify the shmem mcopy implementation to support
UFFDIO_CONTINUE in order to resolve such faults.

Combine the shmem mcopy handler functions into a single
shmem_mcopy_atomic_pte, which takes a mode parameter.  This matches how
the hugetlbfs implementation is structured, and lets us remove a good
chunk of boilerplate.

Link: https://lkml.kernel.org/r/20210302000133.272579-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210302000133.272579-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Wang Qing <wangqing@vivo.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 4cc6e15679966aa49afc5b114c3c83ba0ac39b05
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388146/

Conflicts: include/linux/shmem_fs.h
	mm/shmem.c
	mm/userfaultfd.c
(1. write-protect related conflicts, rebased manually
2. Enclose shmem_mcopy_atomic_pte() with CONFIG_USERFAULTFD to avoid
compile errors when USERFAULTFD is not enabled.)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Idcd822b2a124a089121b9ad8c65061f6979126ec
2025-09-05 06:30:12 +05:30
Axel Rasmussen
413c749e25 BACKPORT: FROMGIT: userfaultfd/selftests: add test exercising minor fault handling
Fix a dormant bug in userfaultfd_events_test(), where we did `return
faulting_process(0)` instead of `exit(faulting_process(0))`.  This caused
the forked process to keep running, trying to execute any further test
cases after the events test in parallel with the "real" process.

Add a simple test case which exercises minor faults. In short, it does
the following:

1. "Sets up" an area (area_dst) and a second shared mapping to the same
   underlying pages (area_dst_alias).

2. Register one of these areas with userfaultfd, in minor fault mode.

3. Start a second thread to handle any minor faults.

4. Populate the underlying pages with the non-UFFD-registered side of
   the mapping. Basically, memset() each page with some arbitrary
   contents.

5. Then, using the UFFD-registered mapping, read all of the page
   contents, asserting that the contents match expectations (we expect
   the minor fault handling thread can modify the page contents before
   resolving the fault).

The minor fault handling thread, upon receiving an event, flips all the
bits (~) in that page, just to prove that it can modify it in some
arbitrary way.  Then it issues a UFFDIO_CONTINUE ioctl, to setup the
mapping and resolve the fault.  The reading thread should wake up and see
this modification.

Currently the minor fault test is only enabled in hugetlb_shared mode, as
this is the only configuration the kernel feature supports.

Link: https://lkml.kernel.org/r/20210301222728.176417-7-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 823e78ae969c4ae9500cac5a84ee5b923634be4d
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388135/

Conflicts: tools/testing/selftests/vm/userfaultfd.c
(Removed write-protect related test and removed uffd_stats usage)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I93a845d45436d835a4fd5de0dfd8a2c54fa15550
2025-09-05 06:30:02 +05:30
Peter Xu
64eac78fd5 userfaultfd: selftest: generalize read and poll
We do very similar things in read and poll modes, but we're copying the
codes around.  Share the codes properly on reading the message and
handling the page fault to make the code cleaner.  Meanwhile this solves
previous mismatch of behaviors between the two modes on that the old code:

- did not check EAGAIN case in read() mode
- ignored BOUNCE_VERIFY check in read() mode

Link: http://lkml.kernel.org/r/20180930074259.18229-3-peterx@redhat.com
Change-Id: I6397636b262f0e9ccb061eae4e5d26d0dd8ac849
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@fb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:29:40 +05:30
Axel Rasmussen
7875cf4367 BACKPORT: FROMGIT: userfaultfd: update documentation to describe minor fault handling
Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to
intercept and resolve minor faults.  Make it clear that COPY and ZEROPAGE
are used for MISSING faults, whereas CONTINUE is used for MINOR faults.

Link: https://lkml.kernel.org/r/20210301222728.176417-6-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit d08ba026886f0161e2bdd3dbd75c4da0fc62a284
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388137/

Conflicts: Documentation/admin-guide/mm/userfaultfd.rst
(Manual rebase by removing text related to write-protect feature)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ib59504247d38034e4c86f692dd63f2b3706fe554
2025-09-05 06:29:23 +05:30
Axel Rasmussen
726458b970 BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl
This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)

Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 14ea86439abaf3423cd9b6712ed5ce8451d2d181
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388136/

Conflicts: fs/userfaultfd.c
	include/linux/hugetlb.h
	include/linux/userfaultfd_k.h
	include/uapi/linux/userfaultfd.h
	mm/hugetlb.c
	mm/userfaultfd.c
(1. 8f251a3d5ce3bdea73bd045ed35db64f32e0d0d9 is not cherry-picked yet so
    switched SetHPageMigratable() to set_active_huge_page() in
    mm/hugetlb.c,
2. Other files conflicts due to lack of write-protect userfaultfd
support. Manually rebased accordingly
3. Included linux/mm.h in linux/userfaultfd_k.h for definitions of
VM_UFFD_*)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I45b62959dcb1d343154cb831113a26e47e77c8af
2025-09-05 06:28:49 +05:30
Lokesh Gidra
75a1231a76 userfaultfd: fix mmap_changing checking in mfill_atomic_hugetlb
commit 67695f18d55924b2013534ef3bdc363bc9e14605 upstream.

In mfill_atomic_hugetlb(), mmap_changing isn't being checked
again if we drop mmap_lock and reacquire it. When the lock is not held,
mmap_changing could have been incremented. This is also inconsistent
with the behavior in mfill_atomic().

Link: https://lkml.kernel.org/r/20240117223729.1444522-1-lokeshgidra@google.com
Fixes: df2cc96e77011 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Change-Id: I7ee2f7c1a28d2511659c4a6c6b305e148846975e
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Nicolas Geoffray <ngeoffray@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-05 06:27:52 +05:30
Mike Rapoport
b3464125ce BACKPORT: userfaultfd: prevent non-cooperative events vs mcopy_atomic races
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of
the data in mcopy_atomic may happen either before of after the memory
mapping modifications and there is no way for the uffd monitor to
maintain consistent view of the process memory layout.

For instance, let's consider fork() running in parallel with
userfaultfd_copy():

process        		         |	uffd monitor
---------------------------------+------------------------------
fork()        		         | userfaultfd_copy()
...        		         | ...
    dup_mmap()        	         |     down_read(mmap_sem)
    down_write(mmap_sem)         |     /* create PTEs, copy data */
        dup_uffd()               |     up_read(mmap_sem)
        copy_page_range()        |
        up_write(mmap_sem)       |
        dup_uffd_complete()      |
            /* notify monitor */ |

If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
be present by the time copy_page_range() is called and they will appear
in the child's memory mappings.  However, if the fork() is the first to
take the mmap_sem, the new pages won't be mapped in the child's address
space.

If the pages are not present and child tries to access them, the monitor
will get page fault notification and everything is fine.  However, if
the pages *are present*, the child can access them without uffd
noticing.  And if we copy them into child it'll see the wrong data.
Since we are talking about background copy, we'd need to decide whether
the pages should be copied or not regardless #PF notifications.

Since userfaultfd monitor has no way to determine what was the order,
let's disallow userfaultfd_copy in parallel with the non-cooperative
events.  In such case we return -EAGAIN and the uffd monitor can
understand that userfaultfd_copy() clashed with a non-cooperative event
and take an appropriate action.

Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Change-Id: I635affbde924a9f082b04c5b1361bf5f199d7323
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-05 06:27:33 +05:30
Axel Rasmussen
34e3dc2db9 BACKPORT: FROMGIT: userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled
For background, mm/userfaultfd.c provides a general mcopy_atomic
implementation.  But some types of memory (i.e., hugetlb and shmem) need a
slightly different implementation, so they provide their own helpers for
this.  In other words, userfaultfd is the only caller of these functions.

This patch achieves two things:

1. Don't spend time compiling code which will end up never being
   referenced anyway (a small build time optimization).

2. In patches later in this series, we extend the signature of these
   helpers with UFFD-specific state (a mode enumeration).  Once this
   happens, we *have to* either not compile the helpers, or
   unconditionally define the UFFD-only state (which seems messier to me).
   This includes the declarations in the headers, as otherwise they'd
   yield warnings about implicitly defining the type of those arguments.

Link: https://lkml.kernel.org/r/20210301222728.176417-4-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 0e6e243e1d9a252c047c4cb1b032cfb31caf87ea
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388133/

Conflicts: include/linux/hugetlb.h
(Manual rebase, required as 1f9dccb25b8fb48778149a002bb25d4ac2899633
isn't CP'ed)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I765cff74cde5fb4ce8141fb95e41848890ced961
2025-09-05 06:27:15 +05:30
Axel Rasmussen
2537fe5e19 FROMGIT: userfaultfd: disable huge PMD sharing for MINOR registered VMAs
As the comment says: for the MINOR fault use case, although the page might
be present and populated in the other (non-UFFD-registered) half of the
mapping, it may be out of date, and we explicitly want userspace to get a
minor fault so it can check and potentially update the page's contents.

Huge PMD sharing would prevent these faults from occurring for suitably
aligned areas, so disable it upon UFFD registration.

Link: https://lkml.kernel.org/r/20210301222728.176417-3-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 19fbec4445b6a690253c1785dfd376ede2cdb9d9
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388134/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I257024f980f43cf8b06b5421ee3278115d1b1540
2025-09-05 06:27:03 +05:30
Axel Rasmussen
875e1dd7be BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode
Patch series "userfaultfd: add minor fault handling", v9.

Overview
========

This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS.
When enabled (via the UFFDIO_API ioctl), this feature means that any
hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also*
get events for "minor" faults.  By "minor" fault, I mean the following
situation:

Let there exist two mappings (i.e., VMAs) to the same page(s) (shared
memory).  One of the mappings is registered with userfaultfd (in minor
mode), and the other is not.  Via the non-UFFD mapping, the underlying
pages have already been allocated & filled with some contents.  The UFFD
mapping has not yet been faulted in; when it is touched for the first
time, this results in what I'm calling a "minor" fault.  As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE.  The idea
is, userspace resolves the fault by either a) doing nothing if the
contents are already correct, or b) updating the underlying contents using
the second, non-UFFD mapping (via memcpy/memset or similar, or something
fancier like RDMA, or etc...).  In either case, userspace issues
UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are
correct, carry on setting up the mapping".

Use Case
========

Consider the use case of VM live migration (e.g. under QEMU/KVM):

1. While a VM is still running, we copy the contents of its memory to a
   target machine. The pages are populated on the target by writing to the
   non-UFFD mapping, using the setup described above. The VM is still running
   (and therefore its memory is likely changing), so this may be repeated
   several times, until we decide the target is "up to date enough".

2. We pause the VM on the source, and start executing on the target machine.
   During this gap, the VM's user(s) will *see* a pause, so it is desirable to
   minimize this window.

3. Between the last time any page was copied from the source to the target, and
   when the VM was paused, the contents of that page may have changed - and
   therefore the copy we have on the target machine is out of date. Although we
   can keep track of which pages are out of date, for VMs with large amounts of
   memory, it is "slow" to transfer this information to the target machine. We
   want to resume execution before such a transfer would complete.

4. So, the guest begins executing on the target machine. The first time it
   touches its memory (via the UFFD-registered mapping), userspace wants to
   intercept this fault. Userspace checks whether or not the page is up to date,
   and if not, copies the updated page from the source machine, via the non-UFFD
   mapping. Finally, whether a copy was performed or not, userspace issues a
   UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents
   are correct, carry on setting up the mapping".

We don't have to do all of the final updates on-demand. The userfaultfd manager
can, in the background, also copy over updated pages once it receives the map of
which pages are up-to-date or not.

Interaction with Existing APIs
==============================

Because this is a feature, a registered VMA could potentially receive both
missing and minor faults.  I spent some time thinking through how the
existing API interacts with the new feature:

UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not
allocate a new page.  If UFFDIO_CONTINUE is used on a non-minor fault:

- For non-shared memory or shmem, -EINVAL is returned.
- For hugetlb, -EFAULT is returned.

UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults.
Without modifications, the existing codepath assumes a new page needs to
be allocated.  This is okay, since userspace must have a second
non-UFFD-registered mapping anyway, thus there isn't much reason to want
to use these in any case (just memcpy or memset or similar).

- If UFFDIO_COPY is used on a minor fault, -EEXIST is returned.
- If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL
  in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case).
- UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns
  -ENOENT in that case (regardless of the kind of fault).

Future Work
===========

This series only supports hugetlbfs.  I have a second series in flight to
support shmem as well, extending the functionality.  This series is more
mature than the shmem support at this point, and the functionality works
fully on hugetlbfs, so this series can be merged first and then shmem
support will follow.

This patch (of 6):

This feature allows userspace to intercept "minor" faults.  By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s).  One of the
mappings is registered with userfaultfd (in minor mode), and the other is
not.  Via the non-UFFD mapping, the underlying pages have already been
allocated & filled with some contents.  The UFFD mapping has not yet been
faulted in; when it is touched for the first time, this results in what
I'm calling a "minor" fault.  As a concrete example, when working with
hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing
page.

This commit adds the new registration mode, and sets the relevant flag on
the VMAs being registered.  In the hugetlb fault path, if we find that we
have huge_pte_none(), but find_lock_page() does indeed find an existing
page, then we have a "minor" fault, and if the VMA has the userfaultfd
registration flag, we call into userfaultfd to handle it.

This is implemented as a new registration mode, instead of an API feature.
This is because the alternative implementation has significant drawbacks
[1].

However, doing it this was requires we allocate a VM_* flag for the new
registration mode.  On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS.  When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.

[1] https://lore.kernel.org/patchwork/patch/1380226/

Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1388132/

Conflicts: arch/x86/Kconfig
	fs/userfaultfd.c
	include/linux/userfaultfd_k.h
	include/uapi/linux/userfaultfd.h
	init/Kconfig
	mm/hugetlb.c
(Lack of userfaultfd write-protect support in 5.4 lead to all conflicts.
Resolved by carefully rebasing such that write-protect related code
doesn't get added)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I43b37272d531341439ceaa03213d0e2415e04688
2025-09-05 06:26:23 +05:30
Peter Xu
f4bdd1a83f BACKPORT: FROMGIT: hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp
Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
userfaultfd-wp is always based on pgtable entries, so they cannot be
shared.

Walk the hugetlb range and unshare all such mappings if there is, right
before UFFDIO_REGISTER will succeed and return to userspace.

This will pair with want_pmd_share() in hugetlb code so that huge pmd
sharing is completely disabled for userfaultfd-wp registered range.

Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 267bda5c9993856b86f91a998df632b29cf517e2
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382208/
Conflicts:
	mm/hugetlb.c

(CONFIG_CMA not CP'ed in this kernel. Manual rebase)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I99d541ce45aaf924fa912f00dafa4caefe307755
2025-09-05 06:25:01 +05:30
Peter Xu
8a27697c5d FROMGIT: mm/hugetlb: move flush_hugetlb_tlb_range() into hugetlb.h
Prepare for it to be called outside of mm/hugetlb.c.

Link: https://lkml.kernel.org/r/20210218231204.15474-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 04297c667b3972097535638e3dab5a66f11ca1df
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382206/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I9bf21fe479548d44a8b611ed832b2f1af7667f4c
2025-09-05 06:24:49 +05:30
Peter Xu
f8d97f57ed FROMGIT: mm/hugetlb: fix build with !ARCH_WANT_HUGE_PMD_SHARE
want_pmd_share() is undefined with !ARCH_WANT_HUGE_PMD_SHARE since it's
put by accident into a "#ifdef ARCH_WANT_HUGE_PMD_SHARE" block.  Moving it
out won't work either since vma_shareable() is only defined within the
block.  Define it for !ARCH_WANT_HUGE_PMD_SHARE instead.

Link: https://lkml.kernel.org/r/20210310185359.88297-1-peterx@redhat.com
Fixes: 5b109cc1cdcc ("hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit 5038f9dd8bbde13ff16435011bb3b0981acc5c1c
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1393174/

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Id716afd43bff303f7eda2c4f70f18d9ea727c698
2025-09-05 06:24:29 +05:30
Peter Xu
572a86b570 BACKPORT: FROMGIT: hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled
Huge pmd sharing could bring problem to userfaultfd.  The thing is that
userfaultfd is running its logic based on the special bits on page table
entries, however the huge pmd sharing could potentially share page table
entries for different address ranges.  That could cause issues on either:

  - When sharing huge pmd page tables for an uffd write protected range, the
    newly mapped huge pmd range will also be write protected unexpectedly, or,

  - When we try to write protect a range of huge pmd shared range, we'll first
    do huge_pmd_unshare() in hugetlb_change_protection(), however that also
    means the UFFDIO_WRITEPROTECT could be silently skipped for the shared
    region, which could lead to data loss.

Since at it, a few other things are done altogether:

  - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because
    that's definitely something that arch code would like to use too

  - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when
    trying to share huge pmd.  Switch to the want_pmd_share() helper.

Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share().

Link: https://lkml.kernel.org/r/20210218231202.15426-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit ab6a0d00a63f92f1f0d220274fa989eb75c09f2b
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382207/
Conflicts:
	include/linux/hugetlb.h
	mm/hugetlb.c

(Manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ie2dff7ab31600cae78914e3278be61516844394e
2025-09-05 06:23:16 +05:30
Peter Xu
614cfe1848 BACKPORT: FROMGIT: hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.

This series tries to disable huge pmd unshare of hugetlbfs backed memory
for uffd-wp.  Although uffd-wp of hugetlbfs is still during rfc stage, the
idea of this series may be needed for multiple tasks (Axel's uffd minor
fault series, and Mike's soft dirty series), so I picked it out from the
larger series.

This patch (of 4):

It is a preparation work to be able to behave differently in the per
architecture huge_pte_alloc() according to different VMA attributes.

Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.

Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adam Ruprecht <ruprecht@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Cannon Matthews <cannonmatthews@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Michal Koutn" <mkoutny@suse.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shawn Anastasio <shawn@anastas.io>
Cc: Steven Price <steven.price@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit b92dc1bfd52ecf338c024815a7c1d44e37a507a1
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm)
Link: https://lore.kernel.org/patchwork/patch/1382205/

Conflicts:
	arch/sparc/mm/hugetlbpage.c
	mm/hugetlb.c
	mm/userfaultfd.c

(1. manual rebase,
 2. c0d0381ade79885c04a04c303284b040616b116e wasn't CP'ed. Rebased by
 appropriately updating huge_pte_alloc(),
 3. manual rebase)

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I50db4e27f2951a5ee01b0dfa22c1ece34e79f881
2025-09-05 06:22:12 +05:30
John Hubbard
90132d55c3 UPSTREAM: selftests/vm/.gitignore: add mremap_dontunmap
Add mremap_dontunmap to .gitignore.

Fixes: 0c28759ee3c9 ("selftests: add MREMAP_DONTUNMAP selftest")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Brian Geffon <bgeffon@google.com>
Link: http://lkml.kernel.org/r/20200517002509.362401-2-jhubbard@nvidia.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 98097701cc0bec06e4bc183cceaf6dfa06a69e10)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iadb72ad4ffec5d0203214a632b612ff3266695c0
2025-09-05 06:22:00 +05:30
Brian Geffon
6bbffc8321 FROMLIST: selftests: Add a MREMAP_DONTUNMAP selftest for shmem
This test extends the current mremap tests to validate that
the MREMAP_DONTUNMAP operation can be performed on shmem mappings.

Signed-off-by: Brian Geffon <bgeffon@google.com>

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Link: https://lore.kernel.org/patchwork/patch/1401225/
Bug: 160737021
Bug: 169683130
Change-Id: Ib357e58526af739cf8df49fc9604372996a9a6b3
2025-09-05 06:21:44 +05:30
Will Deacon
09bf8d4802 UPSTREAM: arm64: tlb: Rewrite stale comment in asm/tlbflush.h
Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but
actually that whole block comment needs to be rewritten.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: If49b019942043655d3ce72021e4daa66a82c60fb
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:10:11 +05:30
Will Deacon
6b7c7a6e79 BACKPORT: arm64: tlb: Avoid synchronous TLBIs when freeing page tables
By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being
called if we fail to batch table pages for freeing. This in turn allows
us to postpone walk-cache invalidation until tlb_finish_mmu(), which
avoids lots of unnecessary DSBs and means we can shoot down the ASID if
the range is large enough.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Ie25f4be366f5a170adbb0e64c7d57ecc2b379a58
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[cyberknight777: Backport to msm-4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:58 +05:30
Will Deacon
4a4bfee462 BACKPORT: arm64: tlb: Adjust stride and type of TLBI according to mmu_gather
Now that the core mmu_gather code keeps track of both the levels of page
table cleared and also whether or not these entries correspond to
intermediate entries, we can use this in our tlb_flush() callback to
reduce the number of invalidations we issue as well as their scope.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Ibe3adb99f9f7b64517c614fd08cf3fa5c034c7ee
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[cyberknight777: Backport to msm-4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:44 +05:30
Will Deacon
445a5b4f6f UPSTREAM: arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code
If there's one thing the RCU-based table freeing doesn't need, it's more
ifdeffery.

Remove the redundant !CONFIG_HAVE_RCU_TABLE_FREE code, since this option
is unconditionally selected in our Kconfig.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Ifbe6dc2d8ce9e7e0d17c1c594325b04c3d39ca95
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:33 +05:30
Will Deacon
04cef5e146 UPSTREAM: arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()
When we are unmapping intermediate page-table entries or huge pages, we
don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the
VA range being unmapped.

Allow the invalidation stride to be passed to __flush_tlb_range(), and
adjust our "just nuke the ASID" heuristic to take this into account.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: I75dd94e14ea9920b3500e8003cad2ee0a74bb05f
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:24 +05:30
Will Deacon
c492fcb06a UPSTREAM: arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()
Add a comment to explain why we can't get away with last-level
invalidation in flush_tlb_range()

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: I6e5251011b20a0270206b0cf50c34f991752792a
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:14 +05:30
Will Deacon
4b905ed5c1 BACKPORT: arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()
Now that our walk-cache invalidation routines imply a DSB before the
invalidation, we no longer need one when we are clearing an entry during
unmap.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Ib0ad415b232f766fb93455f39de5449f4bf45dfb
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[cyberknight777: Backport to msm-4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:09:03 +05:30
Will Deacon
eac8e87d64 UPSTREAM: arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable()
__flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after
writing the new table entry and therefore avoid the barrier prior to the
TLBI instruction.

In preparation for delaying our walk-cache invalidation on the unmap()
path, move the DSB into the TLB invalidation routines.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: I7a8a259d78b6d4410c4a6e59b2f229dbd58244af
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:08:54 +05:30
Will Deacon
d1125d40e3 UPSTREAM: arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()
flush_tlb_kernel_range() is only ever used to invalidate last-level
entries, so we can restrict the scope of the TLB invalidation
instruction.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: I1c7944e35ba4c39e0736419f8fc5fce37c1eebd8
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:08:44 +05:30
Will Deacon
5f85e8aa00 UPSTREAM: MAINTAINERS: Add entry for MMU GATHER AND TLB INVALIDATION
We recently had to debug a TLB invalidation problem on the munmap()
path, which was made more difficult than necessary because:

  (a) The MMU gather code had changed without people realising
  (b) Many people subtly misunderstood the operation of the MMU gather
      code and its interactions with RCU and arch-specific TLB invalidation
  (c) Untangling the intended behaviour involved educated guesswork and
      plenty of discussion

Hopefully, we can avoid getting into this mess again by designating a
cross-arch group of people to look after this code. It is not intended
that they will have a separate tree, but they at least provide a point
of contact for anybody working in this area and can co-ordinate any
proposed future changes to the internal API.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Change-Id: Ie434451c6fea97908ce566d3ce5cf8976207d2fb
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-05 00:08:35 +05:30
Will Deacon
4ea6c750e0 UPSTREAM: asm-generic/tlb: Track which levels of the page tables have been cleared
It is common for architectures with hugepage support to require only a
single TLB invalidation operation per hugepage during unmap(), rather than
iterating through the mapping at a PAGE_SIZE increment. Currently,
however, the level in the page table where the unmap() operation occurs
is not stored in the mmu_gather structure, therefore forcing
architectures to issue additional TLB invalidation operations or to give
up and over-invalidate by e.g. invalidating the entire TLB.

Ideally, we could add an interval rbtree to the mmu_gather structure,
which would allow us to associate the correct mapping granule with the
various sub-mappings within the range being invalidated. However, this
is costly in terms of book-keeping and memory management, so instead we
approximate by keeping track of the page table levels that are cleared
and provide a means to query the smallest granule required for invalidation.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Change-Id: Ifb486381b6e71f4e05c9d38a246bf82de2d224ac
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-04 23:52:30 +05:30
Peter Zijlstra
5db71d2197 UPSTREAM: asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather
Some architectures require different TLB invalidation instructions
depending on whether it is only the last-level of page table being
changed, or whether there are also changes to the intermediate
(directory) entries higher up the tree.

Add a new bit to the flags bitfield in struct mmu_gather so that the
architecture code can operate accordingly if it's the intermediate
levels being invalidated.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Change-Id: I9a19a09e1ddff1e2386a29fe1392b0cb0de9cfe7
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-04 23:52:16 +05:30
Will Deacon
4e420b29a0 UPSTREAM: asm-generic/tlb: Guard with #ifdef CONFIG_MMU
The inner workings of the mmu_gather-based TLB invalidation mechanism
are not relevant to nommu configurations, so guard them with an #ifdef.
This allows us to implement future functions using static inlines
without breaking the build.

Acked-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: I8d6673a8daa1ff4de448477b8f0bfc5cd0ec5719
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-04 23:52:02 +05:30
Will Deacon
5ff20ef405 UPSTREAM: arm64: tlb: Provide forward declaration of tlb_flush() before including tlb.h
As of commit fd1102f0aade ("mm: mmu_notifier fix for tlb_end_vma"),
asm-generic/tlb.h now calls tlb_flush() from a static inline function,
so we need to make sure that it's declared before #including the
asm-generic header in the arch header.

Change-Id: Ib914ff3a30a5f081a05eeccff3d59dd7e084838a
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-04 23:51:52 +05:30
Nicholas Piggin
a40d6e9305 UPSTREAM: mm: mmu_notifier fix for tlb_end_vma
The generic tlb_end_vma does not call invalidate_range mmu notifier, and
it resets resets the mmu_gather range, which means the notifier won't be
called on part of the range in case of an unmap that spans multiple
vmas.

ARM64 seems to be the only arch I could see that has notifiers and uses
the generic tlb_end_vma.  I have not actually tested it.

[ Catalin and Will point out that ARM64 currently only uses the
  notifiers for KVM, which doesn't use the ->invalidate_range()
  callback right now, so it's a bug, but one that happens to
  not affect them.  So not necessary for stable.  - Linus ]

Change-Id: Id7b31c8a84be494b2f6341beb3be23485b5dd6bb
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
2025-09-04 23:51:37 +05:30
Brian Geffon
85a7625403 UPSTREAM: selftests: add MREMAP_DONTUNMAP selftest
Add a few simple self tests for the new flag MREMAP_DONTUNMAP, they are
simple smoke tests which also demonstrate the behavior.

[akpm@linux-foundation.org: convert eight-spaces to hard tabs]
[bgeffon@google.com: v7]
  Link: http://lkml.kernel.org/r/20200221174248.244748-2-bgeffon@google.com
[akpm@linux-foundation.org: coding style fixes]
Signed-off-by: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Deacon <will@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Link: http://lkml.kernel.org/r/20200218173221.237674-2-bgeffon@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 0c28759ee3c91fa8ae14d7672b781b979be274e1)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: I3380f66b882cc24b7c4b5b29923f929722e7c22a
2025-09-04 23:39:22 +05:30
Aneesh Kumar K.V
14a76ff07c BACKPORT: mm/mremap: hold the rmap lock in write mode when moving page table entries.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks().  The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables().  The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken.  This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.

As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page.  The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock.  This can result in stale TLB entries as show
below.

This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::

Optmized PMD move

    CPU 1                           CPU 2                                   CPU 3

    mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

    mmap_write_lock_killable()

                                    addr = old_addr
                                    lock(pte_ptl)
    lock(pmd_ptl)
    pmd = *old_pmd
    pmd_clear(old_pmd)
    flush_tlb_range(old_addr)

    *new_pmd = pmd
                                                                            *new_addr = 10; and fills
                                                                            TLB with new addr
                                                                            and old pfn

    unlock(pmd_ptl)
                                    ptep_clear_flush()
                                    old pfn is free.
                                                                            Stale TLB entry

Optimized PUD move also suffers from a similar race.  Both the above race
condition can be fixed if we force mremap path to take rmap lock.

Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 97113eb39fa7972722ff490b947d8af023e1f6a2)

[Kalesh Singh: Resolve some trivial conflicts in mm/mremap.c]

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5b7235e982ea2efdc155018271fbaf2711fac4c1
2025-09-04 23:39:21 +05:30
Kalesh Singh
0ff84ba061 UPSTREAM: mm/mremap.c: fix extent calculation
When `next < old_addr`, `next - old_addr` arithmetic underflows causing
`extent` to be incorrect.

Make `extent` the smaller of `next - old_addr` or `old_end - old_addr`.

Link: https://lkml.kernel.org/r/20201219170433.2418867-1-kaleshsingh@google.com
Fixes: c49dd34018026 ("mm: speedup mremap on 1GB or larger regions")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit e05986ee7a5814bec0e0075d813daca3d46e4a9e)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ibcbd39dfa16f8ebd1aeca469540b0ef43fb849c9
2025-09-04 23:39:21 +05:30
Kalesh Singh
0fb9c9273f UPSTREAM: x86: mremap speedup - Enable HAVE_MOVE_PUD
HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
source and destination addresses are PUD-aligned.

With HAVE_MOVE_PUD enabled it can be inferred that there is
approximately a 13x improvement in performance on x86.  (See data
below).

------- Test Results ---------

The following results were obtained using a 5.4 kernel, by remapping
a PUD-aligned, 1GB sized region to a PUD-aligned destination.
The results from 10 iterations of the test are given below:

Total mremap times for 1GB data on x86. All times are in nanoseconds.

  Control        HAVE_MOVE_PUD

  180394         15089
  235728         14056
  238931         25741
  187330         13838
  241742         14187
  177925         14778
  182758         14728
  160872         14418
  205813         15107
  245722         13998

  205721.5       15594    <-- Mean time in nanoseconds

A 1GB mremap completion time drops from ~205 microseconds
to ~15 microseconds on x86. (~13x speed up).

Link: https://lkml.kernel.org/r/20201014005320.2233162-6-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: Jia He <justin.he@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit be37c98d1134a8e068b52618c086dab6b34b9a2c)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I7967951289885157ef3a487a6935abe3b860847f
2025-09-04 23:39:20 +05:30
Joel Fernandes (Google)
4e3745d81d mm: select HAVE_MOVE_PMD on x86 for faster mremap
Moving page-tables at the PMD-level on x86 is known to be safe.  Enable
this option so that we can do fast mremap when possible.

Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com
Change-Id: I05ffe342ac69c7f979debddc2e7aa66d90c11fc8
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-04 23:39:19 +05:30
Kalesh Singh
f7e0240ae1 UPSTREAM: arm64: mremap speedup - enable HAVE_MOVE_PUD
HAVE_MOVE_PUD enables remapping pages at the PUD level if both the source
and destination addresses are PUD-aligned.

With HAVE_MOVE_PUD enabled it can be inferred that there is approximately
a 19x improvement in performance on arm64.  (See data below).

------- Test Results ---------

The following results were obtained using a 5.4 kernel, by remapping a
PUD-aligned, 1GB sized region to a PUD-aligned destination.  The results
from 10 iterations of the test are given below:

Total mremap times for 1GB data on arm64. All times are in nanoseconds.

  Control          HAVE_MOVE_PUD

  1247761          74271
  1219896          46771
  1094792          59687
  1227760          48385
  1043698          76666
  1101771          50365
  1159896          52500
  1143594          75261
  1025833          61354
  1078125          48697

  1134312.6        59395.7    <-- Mean time in nanoseconds

A 1GB mremap completion time drops from ~1.1 milliseconds to ~59
microseconds on arm64.  (~19x speed up).

Link: https://lkml.kernel.org/r/20201014005320.2233162-5-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit f5308c896d5de211245a9dc73b4e530f75185dd5)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I30590b7375dde84fe345f4920a06f6a2c0b5aa31
2025-09-04 23:39:19 +05:30
Kalesh Singh
1625fd67e4 BACKPORT: mm: speedup mremap on 1GB or larger regions
Android needs to move large memory regions for garbage collection.  The GC
requires moving physical pages of multi-gigabyte heap using mremap.
During this move, the application threads have to be paused for
correctness.  It is critical to keep this pause as short as possible to
avoid jitters during user interaction.

Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if
the source and destination addresses are PUD-aligned.  For
CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD
entries, since the PUD entry is “folded back” onto the PGD entry.  Add
HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't
supported/tested can turn this off by not selecting the config.

Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Gavin Shan <gshan@redhat.com>
Cc: Hassan Naveed <hnaveed@wavecomp.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: SeongJae Park <sjpark@amazon.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit c49dd340180260c6239e453263a9a244da9a7c85)

[Kalesh Singh: Resolve conflicts in mm/mremap.c]
Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: Ia9b065f5059044815fd05f22bad33c484b2b2b73
2025-09-04 23:39:18 +05:30
Kalesh Singh
b13648903c UPSTREAM: arm64: mremap speedup - Enable HAVE_MOVE_PMD
HAVE_MOVE_PMD enables remapping pages at the PMD level if both the
source and destination addresses are PMD-aligned.

HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that
introduced this config did not enable it on arm64 at the time because
of performance issues with flushing the TLB on every PMD move. These
issues have since been addressed in more recent releases with
improvements to the arm64 TLB invalidation and core mmu_gather code as
Will Deacon mentioned in [2].

>From the data below, it can be inferred that there is approximately
8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64.

--------- Test Results ----------

The following results were obtained on an arm64 device running a 5.4
kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned
destination. The results from 10 iterations of the test are given below.
All times are in nanoseconds.

Control    HAVE_MOVE_PMD

9220833    1247761
9002552    1219896
9254115    1094792
8725885    1227760
9308646    1043698
9001667    1101771
8793385    1159896
8774636    1143594
9553125    1025833
9374010    1078125

9100885.4  1134312.6    <-- Mean Time in nanoseconds

Total mremap time for a 1GB sized PMD-aligned region drops from
~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup).

[1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com
[2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html

Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/r/20201014005320.2233162-3-kaleshsingh@google.com
Link: https://lore.kernel.org/kvmarm/20181029102840.GC13965@arm.com/
Signed-off-by: Will Deacon <will@kernel.org>
(cherry picked from commit 45544eee96065cf183fbb937fe1f45a172b06f4e)

Bug: 151772539
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: If0d97276cf8de1a5893e97444f2d961db05abea5
2025-09-04 23:39:17 +05:30
Joel Fernandes (Google)
e53beba1cb mm: speed up mremap by 20x on large regions
Android needs to mremap large regions of memory during memory management
related operations.  The mremap system call can be really slow if THP is
not enabled.  The bottleneck is move_page_tables, which is copying each
pte at a time, and can be really slow across a large map.  Turning on
THP may not be a viable option, and is not for us.  This patch speeds up
the performance for non-THP system by copying at the PMD level when
possible.

The speedup is an order of magnitude on x86 (~20x).  On a 1GB mremap,
the mremap completion times drops from 3.4-3.6 milliseconds to 144-160
microseconds.

Before:
Total mremap time for 1GB data: 3521942 nanoseconds.
Total mremap time for 1GB data: 3449229 nanoseconds.
Total mremap time for 1GB data: 3488230 nanoseconds.

After:
Total mremap time for 1GB data: 150279 nanoseconds.
Total mremap time for 1GB data: 144665 nanoseconds.
Total mremap time for 1GB data: 158708 nanoseconds.

If THP is enabled the optimization is mostly skipped except in certain
situations.

[joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning]
  Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com
Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com
Change-Id: I9d8e2e356a78f05119f48e37344a2df62cb8f97e
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-04 23:39:17 +05:30
Mel Gorman
8f4e9f7366 mremap: remove LATENCY_LIMIT from mremap to reduce the number of TLB shootdowns
Commit 5d1904204c ("mremap: fix race between mremap() and page
cleanning") fixed races between mremap and other operations for both
file-backed and anonymous mappings.  The file-backed was the most
critical as it allowed the possibility that data could be changed on a
physical page after page_mkclean returned which could trigger data loss
or data integrity issues.

A customer reported that the cost of the TLBs for anonymous regressions
was excessive and resulting in a 30-50% drop in performance overall
since this commit on a microbenchmark.  Unfortunately I neither have
access to the test-case nor can I describe what it does other than
saying that mremap operations dominate heavily.

This patch removes the LATENCY_LIMIT to handle TLB flushes on a PMD
boundary instead of every 64 pages to reduce the number of TLB
shootdowns by a factor of 8 in the ideal case.  LATENCY_LIMIT was almost
certainly used originally to limit the PTL hold times but the latency
savings are likely offset by the cost of IPIs in many cases.  This patch
is not reported to completely restore performance but gets it within an
acceptable percentage.  The given metric here is simply described as
"higher is better".

Baseline that was known good
002:  Metric:       91.05
004:  Metric:      109.45
008:  Metric:       73.08
016:  Metric:       58.14
032:  Metric:       61.09
064:  Metric:       57.76
128:  Metric:       55.43

Current
001:  Metric:       54.98
002:  Metric:       56.56
004:  Metric:       41.22
008:  Metric:       35.96
016:  Metric:       36.45
032:  Metric:       35.71
064:  Metric:       35.73
128:  Metric:       34.96

With patch
001:  Metric:       61.43
002:  Metric:       81.64
004:  Metric:       67.92
008:  Metric:       51.67
016:  Metric:       50.47
032:  Metric:       52.29
064:  Metric:       50.01
128:  Metric:       49.04

So for low threads, it's not restored but for larger number of threads,
it's closer to the "known good" baseline.

Using a different mremap-intensive workload that is not representative
of the real workload there is little difference observed outside of
noise in the headline metrics However, the TLB shootdowns are reduced by
11% on average and at the peak, TLB shootdowns were reduced by 21%.
Interrupts were sampled every second while the workload ran to get those
figures.  It's known that the figures will vary as the
non-representative load is non-deterministic.

An alternative patch was posted that should have significantly reduced
the TLB flushes but unfortunately it does not perform as well as this
version on the customer test case.  If revisited, the two patches can
stack on top of each other.

Link: http://lkml.kernel.org/r/20180606183803.k7qaw2xnbvzshv34@techsingularity.net
Change-Id: I14865a9a03c2c66a9c9920779f039ac59d072ca8
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-04 23:39:16 +05:30
Lokesh Gidra
6cfeb1e47f ANDROID: defconfig: Enable CONFIG_USERFAULTFD
Patches for SELinux support and kernel page-fault restriction in
userfaultfd have been backported. See references below.
So from security perspective it should be safe to enable it in Android.

1) https://android-review.googlesource.com/c/kernel/common/+/1576486
2) https://android-review.googlesource.com/c/kernel/common/+/1576704
3) https://android-review.googlesource.com/c/kernel/common/+/1612597
4) https://android-review.googlesource.com/c/kernel/common/+/1574667

Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Iac5143da76783de57dba229f5761aff9297c17ae
2025-09-04 23:39:15 +05:30
Lokesh Gidra
578eb9178c UPSTREAM: userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled.  In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.

The default value of this knob is changed to 0.  This is required for
correct functioning of pipe mutex.  However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests.  To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.

The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero.  So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland.  For more details,
refer to Andrea's reply [1].

[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/

Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit d0d4730ac2e404a5b0da9a87ef38c73e51cb1664)
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Bug: 160737021
Bug: 169683130
Change-Id: Ic46c0be47d6394d25bd3443ff524936fa568ab85
2025-09-04 23:39:15 +05:30
Lokesh Gidra
c1a4154d17 BACKPORT: userfaultfd: add UFFD_USER_MODE_ONLY
Patch series "Control over userfaultfd kernel-fault handling", v6.

This patch series is split from [1].  The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.

It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel.  For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3].  Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.

This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object.  It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.

The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.

[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808

This patch (of 2):

userfaultfd handles page faults from both user and kernel code.  Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.

A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.

Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Change-Id: I5c50a96f56c862cbbdb001acbe958c9f4c48023a
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-04 23:39:14 +05:30
Peter Xu
110111b482 BACKPORT: userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Change-Id: Ied2500a773b06ac1fdc378e61fd5403a270114a6
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-04 23:39:13 +05:30
Yiwei Zhang
137b1d61b2 ANDROID: init: GKI: enable hidden configs for GPU
Add hidden configs to GKI_HACKS_TO_FIX so they are enabled for loadable
GPU modules built out-of-tree.

Bug: 154525079
Test: rebuild kernel binary and pass checkvintf
Change-Id: I51871132b6a0bd1a55f5db7a9f90177cbc20ef86
Signed-off-by: Yiwei Zhang <zzyiwei@google.com>
2025-09-04 23:39:12 +05:30
Todd Kjos
fea7bf43b3 arm64: configs: Add GKI_HACKS_to_FIX config
Enable config which selects a number of non-module
options which are usually selected by drivers built
as modules

Bug: 141266428
Change-Id: I8d95c96b74b2cfd861d68573135455a3392ff522
Signed-off-by: Todd Kjos <tkjos@google.com>
2025-09-04 23:39:11 +05:30
Todd Kjos
be3a16ae4b ANDROID: init: GKI: add GKI_HACKS_TO_FIX
Add CONFIG_GKI_HACKS_TO_FIX as a mechanism to force
hidden configs to be selected for modules that will be built
separately. Also used to select drivers that need to be
modularized.

As these issues are resolved upstream, the configs should
be removed from GKI_HACKS_TO_FIX

Bug: 141266428
Change-Id: Ic8b2a17cd3a389ac5ef999c8c79b5b5dfee73c8a
Signed-off-by: Todd Kjos <tkjos@google.com>
2025-09-04 23:39:11 +05:30
Uwe Kleine-König
6192276f34 of: restore old handling of cells_name=NULL in of_*_phandle_with_args()
Before commit e42ee61017f5 ("of: Let of_for_each_phandle fallback to
non-negative cell_count") the iterator functions calling
of_for_each_phandle assumed a cell count of 0 if cells_name was NULL.
This corner case was missed when implementing the fallback logic in
e42ee61017f5 and resulted in an endless loop.

Restore the old behaviour of of_count_phandle_with_args() and
of_parse_phandle_with_args() and add a check to
of_phandle_iterator_init() to prevent a similar failure as a safety
precaution. of_parse_phandle_with_args_map() doesn't need a similar fix
as cells_name isn't NULL there.

Affected drivers are:
 - drivers/base/power/domain.c
 - drivers/base/power/domain.c
 - drivers/clk/ti/clk-dra7-atl.c
 - drivers/hwmon/ibmpowernv.c
 - drivers/i2c/muxes/i2c-demux-pinctrl.c
 - drivers/iommu/mtk_iommu.c
 - drivers/net/ethernet/freescale/fman/mac.c
 - drivers/opp/of.c
 - drivers/perf/arm_dsu_pmu.c
 - drivers/regulator/of_regulator.c
 - drivers/remoteproc/imx_rproc.c
 - drivers/soc/rockchip/pm_domains.c
 - sound/soc/fsl/imx-audmix.c
 - sound/soc/fsl/imx-audmix.c
 - sound/soc/meson/axg-card.c
 - sound/soc/samsung/tm2_wm5110.c
 - sound/soc/samsung/tm2_wm5110.c

Thanks to Geert Uytterhoeven for reporting the issue, Peter Rosin for
helping pinpoint the actual problem and the testers for confirming this
fix.

Fixes: e42ee61017f5 ("of: Let of_for_each_phandle fallback to non-negative cell_count")
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Change-Id: I684efc01df23ea32c578c1da4f8ea6fcf6f03ced
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
2025-09-04 21:45:31 +05:30
Uwe Kleine-König
b041b5339b of: Let of_for_each_phandle fallback to non-negative cell_count
Referencing device tree nodes from a property allows to pass arguments.
This is for example used for referencing gpios. This looks as follows:

	gpio_ctrl: gpio-controller {
		#gpio-cells = <2>
		...
	}

	someothernode {
		gpios = <&gpio_ctrl 5 0 &gpio_ctrl 3 0>;
		...
	}

To know the number of arguments this must be either fixed, or the
referenced node is checked for a $cells_name (here: "#gpio-cells")
property and with this information the start of the second reference can
be determined.

Currently regulators are referenced with no additional arguments. To
allow some optional arguments without having to change all referenced
nodes this change introduces a way to specify a default cell_count. So
when a phandle is parsed we check for the $cells_name property and use
it as before if present. If it is not present we fall back to
cells_count if non-negative and only fail if cells_count is smaller than
zero.

Change-Id: Ic7a6a5e667d46847becb2a9593a00ba6db49fc98
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
2025-09-04 21:45:13 +05:30
Trishool Narayanasetty
fdc6d28a01 arm64: configs: trinket: Disable SLUB_DEBUG
Disable CONFIG_SLUB_DEBUG in perf_defconfig
to save few MBs of slab memory.

Change-Id: I7512277a86c71989b0a80619afb30e06803820ed
Signed-off-by: Trishool Narayanasetty <tnarayan@codeaurora.org>
2025-09-04 21:39:44 +05:30
theshaenix
b87a245a99 arm64: configs: Disable MMC test module 2025-09-04 21:39:09 +05:30
theshaenix
203baf7e4e arm64: configs: Disable mpq dvb demux modules 2025-09-04 21:38:24 +05:30
Michael Bestas
4f63f6218c power: reset: Force a warm reboot when in panic
Change-Id: I68697d292aaa770a7eb1424a1e423f339628acf7
2025-09-04 21:37:26 +05:30
Saravana Kannan
c46609d5a5 arm64: configs: Disable HW tracing features
HW tracing features shouldn't be enabled in any final product. So
disable it.

Bug: 154966878
Signed-off-by: Saravana Kannan <saravanak@google.com>
Change-Id: I6603e71b0912dd89d653bb0bd36a0a4cb8b504e1
2025-09-04 21:36:35 +05:30
Michael Bestas
23ad416291 power: reset: Move in_panic handling out of dload mode
* Some devices might want to use that logic
  without enabling download mode

Change-Id: Idd4a2cc8a47041740f8d4e9f43bffd84fae5830d
2025-09-04 21:35:39 +05:30
Will McVicker
4d91e7f524 GKI: ARM: dts: msm: disable coresight for atoll/sdmmagpie/sm6150/sm8150/trinket
Coresight is used for debugging purposes. When the debugging configs are
disabled, having these included causes power regressions due to clks
being left on. So lets disable all the coresight DT entries by default.

Signed-off-by: Will McVicker <willmcvicker@google.com>
Bug: 156429236
Test: compile, verify list of probed devices
Change-Id: I84f9c874f2f5e8720ced23c7b4268d1b536b96a7
2025-09-04 21:35:04 +05:30
Yabin Cui
e7d40e3672 msm: kgsl: introduce CONFIG_CORESIGHT_ADRENO.
We want to build coresight drivers as builtin drivers. But
adreno-coresight.c in msm_adreno.ko calls coresight functions.
To avoid exporting new symbols in vmlinux and breaking the ABI, this
patch separates adreno-coresight.c into CONFIG_CORESIGHT_ADRENO.
CONFIG_CORESIGHT_ADRENO is only enabled when both coresight and adreno
are builtin drivers.

Bug: 167414982
Bug: 170753932
Signed-off-by: Yabin Cui <yabinc@google.com>
Change-Id: I7488293445ade738ba03cc457320e0d74f910886
2025-09-04 21:34:30 +05:30
Aaron Ding
893f36add3 arm64: configs: vendor: Disable CONFIG_SECURITY_SMACK
Security hardening:
1. unset CONFIG_SECURITY_SMACK which implicitly enable CONFIG_NETLABEL
2. set CONFIG_SECURITY_NETWORK which CONFIG_SECURITY_SELINUX depends on

Bug: 198690429
Test: Manual testing, vts/vts-kernel, pts/base

Change-Id: Iff2a4ce1dc7897bbeec57a0b3966fdb481f2d4e4
Signed-off-by: Aaron Ding <aaronding@google.com>
Signed-off-by: Roger Liao <rogerliao@google.com>
2025-09-04 21:31:29 +05:30
Alexander Potapenko
e84dcbe131 arm64: configs: vendor: initialize locals with zeroes
This patch switches compiler-based stack initialization from 0xAA
to zero pattern, resulting in much more efficient code and saner
defaults for uninitialized local variables.

Bug: 154198143
Test: run cuttlefish and observe the following lines in dmesg:
      test_stackinit: all tests passed!
      test_meminit: all 130 tests passed!

Signed-off-by: Alexander Potapenko <glider@google.com>
Change-Id: I49821914df887760e90295d91fa54a2ebda8240b
2025-09-04 21:29:57 +05:30
Jonglin Lee
b7e1c65fa4 arm64: configs: vendor: Disable stability debug configs
Disable the following stability debug configs for shipping:

CONFIG_EDAC_KRYO_ARM64_PANIC_ON_UE

Bug: 131191106
Bug: 129433164
Bug: 129709115
Change-Id: I376fd0f070e176f41c7dd213822341a7d7de15a1
Signed-off-by: Jonglin Lee <jonglin@google.com>
2025-09-04 21:28:40 +05:30
Steve Muckle
c8c98e9d5d arm64: configs: vendor: turn off CONFIG_MEMORY_STATE_TIME
This feature is not being used.

Bug: 117847156
Change-Id: Ia398da799856ee93bd6097204ab42dff8a617f2a
Signed-off-by: Steve Muckle <smuckle@google.com>
2025-09-04 21:27:43 +05:30
Patrick Tjin
55141f95e7 arm64: configs: vendor: add security configs
CONFIG_BUG_ON_DATA_CORRUPTION (OS.KRN.2.General.11)
CONFIG_SCHED_STACK_END_CHECK  (OS.KRN.2.General.12)
CONFIG_DEFAULT_MMAP_MIN_ADDR to 32768

Bug: 132751128
Bug: 132741349
Bug: 132742402
Test: Boot to home
Change-Id: I337fb0fcb4bf7379a5957bf66e6debbed9f3f98d
Signed-off-by: Patrick Tjin <pattjin@google.com>
2025-09-04 21:26:39 +05:30
Alexander Potapenko
11b38843b5 arm64: configs: vendor: enable heap and stack initialization.
This patch enables CONFIG_INIT_STACK_ALL=y and
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y, effectively turning on stack and heap
initialization in GKI kernels.

Doing so will help us mitigate information leaks and make code that
depends on uninitialized memory execute deterministically. We'll also
get coverage for the initialization features on the existing kernel
tests.

Bug: 144999193
Change-Id: I40ad526b2e595c84b122b0308d967a3874564252
Signed-off-by: Alexander Potapenko <glider@google.com>
2025-09-04 21:25:22 +05:30
Elena Petrova
9b18708214 arm64: configs: vendor: Enable CRYPTO_DRBG_HASH, CRYPTO_DRBG_CTR
Enable Deterministic Random Bit Generators (DRBG) as required by NIAP
mobile device certification.

This change also implicitly enables CRYPTO_DRBG_HMAC.

Bug: 133263874
Signed-off-by: Elena Petrova <lenaptr@google.com>
Change-Id: Icdb9b5f486d19285dd85b36e9297e21537a46c00
2025-09-04 21:23:37 +05:30
Yan Yan
274f059943 arm64: configs: vendor: CONFIG_XFRM_MIGRATE=y
To be able to update addresses of an IPsec SA, as required by MOBIKE

Generated via:
  make ARCH=arm64 <device>_defconfig
  update config
  make ARCH=arm64 savedefconfig
  mv defconfig arch/arm64/configs/<device>_defconfig

Test: built and boot
Bug: 169170652
Signed-off-by: Yan Yan <evitayan@google.com>
Change-Id: I8922502fa8c0704cb1da1fc2eab26489fb18bf4a
2025-09-04 21:20:02 +05:30
Yan Yan
2311125fd1 arm64: configs: vendor: CONFIG_CRYPTO_CHACHA20POLY1305=y
Enable ChaCha20Poly1305 for the usage of IPsec

Generated via:
  make ARCH=arm64 <device>_defconfig
  update config
  make ARCH=arm64 savedefconfig
  mv defconfig arch/arm64/configs/<device>_defconfig

Test: built and boot
Bug: 161717358
Signed-off-by: Yan Yan <evitayan@google.com>
Change-Id: If4fc953e8e808536106d331478f50255a55994d8
2025-09-04 21:19:11 +05:30
Srinivasarao P
fdb48d66eb arm64: configs: vendor: Enable CONFIG_DM_BOW
Enable DM-BOW for Android userdata check point feature.

Change-Id: Id5d38ca6f68c81b2e3cedca4de80abd53e3094e0
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2025-09-04 21:16:55 +05:30
Swetha Chikkaboraiah
3d6f137794 arm64: configs: vendor: Disable DEBUG_FS
Debugfs is not needed on user builds, so disabling
same at compile time for ARM 64.

Change-Id: I01a4dd199a357dd85838c071020bf966079a2092
2025-09-04 21:16:54 +05:30
Patrick Rohr
864441ca34 arm64: configs: vendor: Enable bandwidth limiting options
Enable the following options to support go/bandwidth-limiting:
- CONFIG_NET_SCH_TBF
- CONFIG_NET_CLS_MATCHALL
- CONFIG_NET_ACT_POLICE
- CONFIG_NET_ACT_BPF

NET_ACT_BPF is enabled in ACK for android13-5.10+ & up:
  https://android-review.googlesource.com/c/kernel/common/+/1860654
  https://android-review.googlesource.com/c/kernel/common/+/1856374
but there is no longer an ACK developer branch for older kernels.

The remaining options were previously enabled in GKI.

Bug: 157552970
Test: TreeHugger
Signed-off-by: Patrick Rohr <prohr@google.com>
Change-Id: I53035d0e53d58ac9565dcafb5158b8cc99a8d00f
2025-09-04 21:16:53 +05:30
Chris Ye
d53a2733c8 ANDROID: arm64: configs: vendor: enable hid-playstation driver/rumble
To enable DualSense driver, i.e. hid-playstation, we need to set
CONFIG_HID_PLAYSTATION to "y". To enable Dualsense rumble, we need to
set CONFIG_PLAYSTATION_FF to "y".

Bug: 167947264

Signed-off-by: Chris Ye <lzye@google.com>
Change-Id: I26fb915e33b511feb3cf3557ec36bda0dc1b4e04
2025-09-04 21:16:52 +05:30
theshaenix
ed966ee721 config: set CONFIG_ANDROID_LOW_MEMORY_KILLER to n 2025-09-04 21:08:59 +05:30
theshaenix
539198905c added build script 2025-09-04 21:08:27 +05:30
theshaenix
1a0aaa0ccd configs: welcome ShadowBladeX 2025-08-29 13:46:30 +05:30
debdeep199x
7879014b13 arch:configs: set config_debug_fs
All HALs in device manifest are declared in FCM <= level 5
ERROR: files are incompatible: Runtime info and framework compatibility matrix are incompatible: No compatible kernel requirement found (kernel FCM version = 5).
For kernel requirements at matrix level 5, Kernel config errors:
    For config CONFIG_DEBUG_FS, value = y but required n
: Success
INCOMPATIBLE
2024-10-24 12:30:19 +00:00
debdeep199x
97ff196b47 ffs Android.bp --> Androidbp 2024-10-24 06:20:49 +00:00
debdeep199x
bc45868a21 drivers: add drivers for KernelSU 0.9.5 2024-10-21 15:30:16 +00:00
debdeep199x
9fdae32a7c Merge commit '38752ab9844857f2503bead4dc30b1a2ed89d2c2' into fourteen 2024-10-21 06:45:55 +00:00
debdeep199x
930f0d895d Merge commit '9ecf898947f7f0089c149337f15fba24a22bcb8a' into fourteen 2024-10-21 06:45:39 +00:00
debdeep199x
6313661ac3 Revert "drivers: introduce kernelsu"
This reverts commit 9198da5b10.
2024-10-18 06:23:26 +00:00
Egor Pilipenko
38752ab984 fixup! power: oplus_battery_msm7125_R: Run oplus_set_otg_switch_status on init early as possible
Signed-off-by: Egor Pilipenko <ctapchuk@gmail.com>
2024-06-02 18:31:56 +07:00
Egor Pilipenko
9ecf898947 Revert "power: oppo: Make the otg_switch read-only"
This reverts commit 932dca8be9.
2024-06-02 17:54:45 +07:00
debdeep199x
9198da5b10 drivers: introduce kernelsu 2024-05-23 15:30:07 +00:00
624 changed files with 100687 additions and 4347 deletions

6
.elts/config.yaml Normal file
View File

@@ -0,0 +1,6 @@
upstream_repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
upstream_base: 4.19.304
base: 4.14.336
upstream_version: 4.19.322
version: 4.14.355
rc: 1

992
.elts/meta/4.14.356.yaml Normal file
View File

@@ -0,0 +1,992 @@
5ea681973e3c518892825457c55559b0daa1c3d3:
title: 'staging: iio: frequency: ad9833: Get frequency value statically'
mainline: 80109c32348d7b2e85def9efc3f9524fb166569d
upstream: a3138f0925714ea47f817257447fa0b87c8bcf28
2253daf50c035c2cd8a8ca74b7bba17bb936fb18:
title: 'staging: iio: frequency: ad9833: Load clock using clock framework'
mainline: 8e8040c52e63546d1171c188a24aacf145a9a7e0
upstream: a6316b6f127a877285c83d2ed45b20e6712e6d1b
ab37e7fbaeb484d79986ed060a4f865c05c3c248:
title: 'staging: iio: frequency: ad9834: Validate frequency parameter value'
mainline: b48aa991758999d4e8f9296c5bbe388f293ef465
upstream: 5edc3a45ef428501000a7b23d0e1777a548907f6
12cd0e98282326cc494b69e74947a585afd21f53:
title: 'usbnet: ipheth: fix carrier detection in modes 1 and 4'
mainline: 67927a1b255d883881be9467508e0af9a5e0be9d
upstream: 32dafeb84c84a2d420de27e5e30e4ea6339e4d07
c0360f13de3287dfab2137634c65b55e3949f325:
title: 'net: ethernet: use ip_hdrlen() instead of bit shift'
mainline: 9a039eeb71a42c8b13408a1976e300f3898e1be0
upstream: a81761c1ba59444fc3f644e7d8713ac35e7911c4
71d7a71aecd5608f04ebe27edf45e296131503b1:
title: 'scripts: kconfig: merge_config: config files: add a trailing newline'
mainline: 33330bcf031818e60a816db0cfd3add9eecc3b28
upstream: 6a130ec2f0646a8544308b6cf983269d5a2a7fa0
e1ebafd5c0058b061a4583c4ba60a4508b00d55f:
title: 'arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma'
mainline: 741f5ba7ccba5d7ae796dd11c320e28045524771
upstream: 4a0400793ac3961a07fcd472f7eb789d12d0db6a
64bdfeaca4b2bca14039364e1569c9f0d399e8cf:
title: 'net/mlx5: Update the list of the PCI supported devices'
mainline: 85327a9c415057259b337805d356705d0d0f4200
upstream: a689f610abc8d4c8dfd775e09fd306f19cfe6509
94fc3405a60ae7370428a02b7ffa8c1e1a0db0fb:
title: 'net: ftgmac100: Enable TX interrupt to avoid TX timeout'
mainline: fef2843bb49f414d1523ca007d088071dee0e055
upstream: 7f84d4613b9fdf9e14bbab867e879a0df782a163
d3cde3469100da8f52f60b814b8cab66244d7f56:
title: 'net: dpaa: Pad packets to ETH_ZLEN'
mainline: cbd7ec083413c6a2e0c326d49e24ec7d12c7a9e0
upstream: cd5b9d657ecd44ad5f254c3fea3a6ab1cf0e2ef7
e2ed6238364c4b1a6beba54d4d16c0f2dc801dc0:
title: 'selftests/vm: remove call to ksft_set_plan()'
c29e4bebce862efea2d600187e150237e563b89b:
title: 'selftests/kcmp: remove call to ksft_set_plan()'
a7d6bf885524c3d4063dd145fb93c2c89cc98848:
title: 'ASoC: allow module autoloading for table db1200_pids'
mainline: 0e9fdab1e8df490354562187cdbb8dec643eae2c
upstream: 71d74f78ae565a64eae3022020a9d4e82dace694
ac0819d2626c52220d318ed9ea3d5b2ee4b2f1c2:
title: 'pinctrl: at91: make it work with current gpiolib'
mainline: 752f387faaae0ae2e84d3f496922524785e77d60
upstream: 33d615ee40f0651bb3d282a66e6f59eae6ea4ada
fc168b848cd91fb8dd89637cb6a063670ed6b5dd:
title: 'microblaze: don''t treat zero reserved memory regions as error'
mainline: 0075df288dd8a7abfe03b3766176c393063591dd
upstream: a5bfdf7e4d956f3035779687eade8da23560f4bb
0fcd4ef6d494a3de6307fa976919cd800f343df6:
title: 'net: ftgmac100: Ensure tx descriptor updates are visible'
mainline: 4186c8d9e6af57bab0687b299df10ebd47534a0a
upstream: 46974d97d58a2a91da16b032de0c78c4346bc1c2
f3f9ddf39b4b25d0a99b2323cfed0659b6cffb45:
title: 'spi: bcm63xx: Enable module autoloading'
mainline: 709df70a20e990d262c473ad9899314039e8ec82
upstream: 1cde0480b087bd8f4e12396fcbb133ee9d9876bd
b427f522d100d82fc9a282af7780cd140ac4e0bf:
title: 'x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency'
mainline: 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e
upstream: 1da08d443212eba1f731b3f163c5b23ec1c882c1
900f2cf495f5f7e9088364d3e4e483756bff58e3:
title: 'ocfs2: add bounds checking to ocfs2_xattr_find_entry()'
mainline: 9e3041fecdc8f78a5900c3aa51d3d756e73264d6
upstream: b49a786beb11ff740cb9e0c20b999c2a0e1729c2
317e5483f3b80fb042b955d0e80c336698046cc1:
title: 'ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()'
mainline: af77c4fc1871847b528d58b7fdafb4aa1f6a9262
upstream: e2b3d7a9d019d4d1a0da6c3ea64a1ff79c99c090
c087e2303ab05433ed6981a730807bfc14dabe78:
title: 'gpio: prevent potential speculation leaks in gpio_device_get_desc()'
mainline: d795848ecce24a75dfd46481aee066ae6fe39775
upstream: 18504710442671b02d00e6db9804a0ad26c5a479
fd204ed48bc3d5d4315957a2bf536d2df43c44e8:
title: 'USB: serial: pl2303: add device id for Macrosilicon MS3020'
mainline: 7d47d22444bb7dc1b6d768904a22070ef35e1fc0
upstream: 79efd61e1c50d79d89a48e6c01761f8f890a83dd
90c7ddee26f4a63a9d2f173c5056eae945d345a7:
title: 'wifi: ath9k: fix parameter check in ath9k_init_debug()'
mainline: 6edb4ba6fb5b946d112259f54f4657f82eb71e89
upstream: ac848aff235efdd903c0c185c1cb44496c5b9bb0
f2682fdc54e734785dd48a4850403f89e0e3cbe8:
title: 'wifi: ath9k: Remove error checks when creating debugfs entries'
mainline: f6ffe7f0184792c2f99aca6ae5b916683973d7d3
upstream: 0c3bbcbce030ca203963c520191ad2c5d89bf862
a99c4727604215b66734a480a049ad9451bfef34:
title: 'can: bcm: Clear bo->bcm_proc_read after remove_proc_entry().'
mainline: 94b0818fa63555a65f6ba107080659ea6bcca63e
upstream: f5059fae5ed518fc56494ce5bdd4f5360de4b3bc
ae07cf5eff7f99b3eb8927ace566f0786221dee4:
title: 'Bluetooth: btusb: Fix not handling ZPL/short-transfer'
mainline: 7b05933340f4490ef5b09e84d644d12484b05fdf
upstream: 2dfadca5439eca817fbb206c6003e5526d5e73df
3bb55bc8856f2de993ef77536a774c62dc252926:
title: 'block, bfq: fix possible UAF for bfqq->bic with merge chain'
mainline: 18ad4df091dd5d067d2faa8fce1180b79f7041a7
upstream: a9bdd5b36887d2bacb8bc777fd18317c99fc2587
940b968ed647a978296610464a5bfd0ee1c8b0f4:
title: 'block, bfq: don''t break merge chain in bfq_split_bfqq()'
mainline: 42c306ed723321af4003b2a41bb73728cab54f85
upstream: 9e813033594b141f61ff0ef0cfaaef292564b041
086695765117a72978f0210989a2fd377a86287a:
title: 'spi: ppc4xx: handle irq_of_parse_and_map() errors'
mainline: 0f245463b01ea254ae90e1d0389e90b0e7d8dc75
upstream: f2a73a1f728e6fe765fc07c043a3d1670d854518
2c79e19208b397228218de1ceb98f907ea84b720:
title: 'spi: ppc4xx: Avoid returning 0 when failed to parse and map IRQ'
mainline: 7781f1d120fec8624fc654eda900fc8748262082
upstream: e546902c4917656203e0e134630a873e9b6d28af
8e6ee55dc9b2117c6e85d4e00724de05acc66e40:
title: 'ARM: versatile: fix OF node leak in CPUs prepare'
mainline: f2642d97f2105ed17b2ece0c597450f2ff95d704
upstream: 722d698f3e8de32a753ee1148b009406d0b3b829
f2dbb797e5c4fbe261bac004384161a4d2df0485:
title: 'reset: berlin: fix OF node leak in probe() error path'
mainline: 5f58a88cc91075be38cec69b7cb70aaa4ba69e8b
upstream: 041b763798bf460307db3bd8144e3732aef52902
115ada83f0a71ae108fe8c58a4d9f6b0ef3b3be3:
title: 'clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()'
mainline: ca140a0dc0a18acd4653b56db211fec9b2339986
upstream: 24d689791c6dbdb11b4b5208ed746f28fe651715
1ed2f7aabb6e52fd4d1b13daeb56b5e1c6904e90:
title: 'hwmon: (max16065) Fix overflows seen when writing limits'
mainline: 744ec4477b11c42e2c8de9eb8364675ae7a0bd81
upstream: b665734d4772df97eaeb4d943dc104dbd9ec1e9a
e7ee0a8fd442b2fb7586cc29d397017bc638ed50:
title: 'mtd: slram: insert break after errors in parsing the map'
mainline: 336c218dd7f0588ed8a7345f367975a00a4f003f
upstream: 6015f85fc8eba1ccf7db8b20a9518388fcb4fbf7
b8dbab0d70214275e00278a332c3456de5c74031:
title: 'hwmon: (ntc_thermistor) fix module autoloading'
mainline: b6964d66a07a9003868e428a956949e17ab44d7e
upstream: 6f91b0464947c4119682731401e11e095d8db06d
c02345a3444b243abae115fc9cc38d3453c8964a:
title: 'power: supply: max17042_battery: Fix SOC threshold calc w/ no current sense'
mainline: 3a3acf839b2cedf092bdd1ff65b0e9895df1656b
upstream: f9e9ce0f2b420b63c29e96840865640098bbafe7
8e8bed0aecaeb206024593bc125ecb5949b10cb5:
title: 'fbdev: hpfb: Fix an error handling path in hpfb_dio_probe()'
mainline: aa578e897520f32ae12bec487f2474357d01ca9c
upstream: da77622151181c1d7d8ce99019c14cd5bd6453b5
2b1444de44d853578d982acd4d0a58082334d1ba:
title: 'drm/amd: fix typo'
mainline: 229f7b1d6344ea35fff0b113e4d91128921f8937
upstream: f4a502c468886ffc54e436279d7f573b4d02bd5b
28cbb9587a21b4052424ece391f8953ea3ce1d93:
title: 'drm/rockchip: vop: Allow 4096px width scaling'
mainline: 0ef968d91a20b5da581839f093f98f7a03a804f7
upstream: 6a512ab02cde62f147351d38ebefa250522336c4
541940c2d6db90f0a9448686b0e0838a2a7f134b:
title: 'drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets'
mainline: 3fbaf475a5b8361ebee7da18964db809e37518b7
upstream: ec7cf75b4e2b584e6f2b167ce998428b42522df6
e903f2245bb193bb8a6f11804e56b0b85ae6a9a9:
title: 'jfs: fix out-of-bounds in dbNextAG() and diAlloc()'
mainline: e63866a475562810500ea7f784099bfe341e761a
upstream: d1017d2a0f3f16dc1db5120e7ddbe7c6680425b0
2f418bb73f8edbe9b8afbbf59e5b2e217ab391bd:
title: 'ipmi: docs: don''t advertise deprecated sysfs entries'
mainline: 64dce81f8c373c681e62d5ffe0397c45a35d48a2
upstream: e4e81788a8b83f267d25b9f3b68cb4837b71bdd9
f9d12089d914dc23b18637db0091a61a2c0ea32b:
title: 'drm/msm: fix %s null argument error'
mainline: 25b85075150fe8adddb096db8a4b950353045ee1
upstream: b7a63d4bac70f660d63cba66684bc03f09be50ad
aa244feeb7d2f904f18638a7369216d4e410d44b:
title: 'xen: use correct end address of kernel for conflict checking'
mainline: fac1bceeeb04886fc2ee952672e6e6c85ce41dca
upstream: f38d39918cff054f4bfc466cac1c110d735eda94
1a07c8045664899758b6c312761686e49f6d2fc0:
title: 'xen/swiotlb: simplify range_straddles_page_boundary()'
mainline: bf70726668c6116aa4976e0cc87f470be6268a2f
upstream: 5937434b2ca4884798571079cc71ad3a58b3c8fd
2690899d56f2ed0cb6b24a60c02bcbf8c950d35c:
title: 'xen/swiotlb: add alignment check for dma buffers'
mainline: 9f40ec84a7976d95c34e7cc070939deb103652b0
upstream: 66c845af6613a62f08d1425054526cc294842914
29e08a988cd84cd6b7afb1790e343d8290f58664:
title: 'selftests/bpf: Fix error compiling test_lru_map.c'
mainline: cacf2a5a78cd1f5f616eae043ebc6f024104b721
upstream: e5fa35e20078c3f08a249a15e616645a7e7068e2
efd2f49ae3bc833b879ef4091384fe42db871bec:
title: 'kthread: add kthread_work tracepoints'
mainline: f630c7c6f10546ebff15c3a856e7949feb7a2372
upstream: 65c1957181a1e2cd5344e49d4e5b6e9f930092d1
85a8b320b6eda4e743d3633d86653d16e9a859c1:
title: 'kthread: fix task state in kthread worker if being frozen'
mainline: e16c7b07784f3fb03025939c4590b9a7c64970a7
upstream: 6430d6a00b0d8d3de663ecc0da248f8f3557b82e
449027e8478709334ca7d9445060ed04464b43bb:
title: 'jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()'
mainline: aa3c0c61f62d682259e3e66cdc01846290f9cd6c
upstream: 58a48155ce22e8e001308a41a16d8c89ee003b80
aa5e7df17ef64ae426c4ac8fcdde231c2bba3d57:
title: 'ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard'
mainline: 20cee68f5b44fdc2942d20f3172a262ec247b117
upstream: 6f44db60f9c42265e1e61596994f457f3c30d432
179d760ab3fee99160a41a12ba49017e61c7ae34:
title: 'smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso'
mainline: 2749749afa071f8a0e405605de9da615e771a7ce
upstream: 029ebd49aab06dd438c1256876730518aef7da35
09313601d16d88eed265af9c0bd4b029c4524220:
title: 'ext4: avoid negative min_clusters in find_group_orlov()'
mainline: bb0a12c3439b10d88412fd3102df5b9a6e3cd6dc
upstream: 7b98a77cdad322fa3c7babf15c37659a94aa3593
a71386889f3ee75ee1507c741298d505973cb8d8:
title: 'ext4: return error on ext4_find_inline_entry'
mainline: 4d231b91a944f3cab355fce65af5871fb5d7735b
upstream: ce8f41fca0b6bc69753031afea8fc01f97b5e1af
c3afa5821f1e517165033292a44f8aeb43a8341c:
title: 'ext4: avoid OOB when system.data xattr changes underneath the filesystem'
mainline: c6b72f5d82b1017bad80f9ebf502832fc321d796
upstream: 5b076d37e8d99918e9294bd6b35a8bbb436819b0
41f3f6c63ebe7984124f65fdcf0d1ef3bfff9e41:
title: 'nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()'
mainline: 9403001ad65ae4f4c5de368bdda3a0636b51d51a
upstream: 2b78e9df10fb7f4e9d3d7a18417dd72fbbc1dfd0
1150830d554e2921e69ebb150c3c2d07baa0216d:
title: 'nilfs2: determine empty node blocks as corrupted'
mainline: 111b812d3662f3a1b831d19208f83aa711583fe6
upstream: 6d7f4fac707a187882b8c610e8889c097b289082
811f9859f37f3be1ebeb26c221fbaaa593199e99:
title: 'nilfs2: fix potential oob read in nilfs_btree_check_delete()'
mainline: f9c96351aa6718b42a9f42eaf7adce0356bdb5e8
upstream: f3a9859767c7aea758976f5523903d247e585129
218417bab6747be0d5ae6e0161a5796d433d75ea:
title: 'perf sched timehist: Fix missing free of session in perf_sched__timehist()'
mainline: 6bdf5168b6fb19541b0c1862bdaa596d116c7bfb
upstream: 1d4d7e56c4aa834f359a29aa64f5f5c01e3453eb
c30bffcf9b9de7aeb85e602a62c1b199e44c7b04:
title: 'perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time'
mainline: 39c243411bdb8fb35777adf49ee32549633c4e12
upstream: d825de712b59dfd6e256c0ecad7443da652c2b22
cfec54fd64719d252a6f53f7cf8925d439b5a440:
title: 'perf time-utils: Fix 32-bit nsec parsing'
mainline: 38e2648a81204c9fc5b4c87a8ffce93a6ed91b65
upstream: c062eebe3b3d98ae2ef61fe8008f2c12bfa31249
6e0b571ed540f42734528e92a461d02f7da43a01:
title: 'clk: rockchip: Set parent rate for DCLK_VOP clock on RK3228'
mainline: 1d34b9757523c1ad547bd6d040381f62d74a3189
upstream: 7b9e7a258b9f4d68a9425c67bfee1e1e926d1960
fe35dd3f675597f83ae26c6d5086a9464c8dc941:
title: 'drivers: media: dvb-frontends/rtl2832: fix an out-of-bounds write error'
mainline: 8ae06f360cfaca2b88b98ca89144548b3186aab1
upstream: 7065c05c6d58b9b9a98127aa14e9a5ec68173918
f046671d18d577d0ed12e6cf37913d543be14952:
title: 'drivers: media: dvb-frontends/rtl2830: fix an out-of-bounds write error'
mainline: 46d7ebfe6a75a454a5fa28604f0ef1491f9d8d14
upstream: 8ffbe7d07b8e76193b151107878ddc1ccc94deb5
526fd6e5af9933b37ab818aeb51beca91da649be:
title: 'PCI: xilinx-nwl: Fix register misspelling'
mainline: a437027ae1730b8dc379c75fa0dd7d3036917400
upstream: 43b361ca2c977e593319c8248e549c0863ab1730
e2138450b0fd6eec4ec39b7c0ddc8bd2c63e1158:
title: 'RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency'
mainline: 86dfdd8288907f03c18b7fb462e0e232c4f98d89
upstream: da2708a19f45b4a7278adf523837c8db21d1e2b5
fab82568499e61ec55a0fac9781cffff4d9d6ba7:
title: 'pinctrl: single: fix missing error code in pcs_probe()'
mainline: cacd8cf79d7823b07619865e994a7916fcc8ae91
upstream: 4f227c4dc81187fcca9c858b070b9d3f586c9b30
904ce6f2f61066aab8e6e20b705b8e45a6adafd3:
title: 'clk: ti: dra7-atl: Fix leak of of_nodes'
mainline: 9d6e9f10e2e031fb7bfb3030a7d1afc561a28fea
upstream: d6b680af89ca0bf498d105265bc32061979e87f1
f6340536595507abf266bf00336263a0fe54b6d5:
title: 'pinctrl: mvebu: Fix devinit_dove_pinctrl_probe function'
mainline: c25478419f6fd3f74c324a21ec007cf14f2688d7
upstream: 856d3ea97be0dfa5d7369e071c06c9259acfff33
c3222aec5dbf651634bac47c1137c4b0c5209b13:
title: 'RDMA/cxgb4: Added NULL check for lookup_atid'
mainline: e766e6a92410ca269161de059fff0843b8ddd65f
upstream: b12e25d91c7f97958341538c7dc63ee49d01548f
a4191b6aaf636e979332330d22348c461169a8c7:
title: 'ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()'
mainline: e229897d373a87ee09ec5cc4ecd4bb2f895fc16b
upstream: 20cbc281033ef5324f67f2d54bc539968f937255
e6eedced9e6d8c218bd815ac165a299c10b37471:
title: 'nfsd: call cache_put if xdr_reserve_space returns NULL'
mainline: d078cbf5c38de83bc31f83c47dcd2184c04a50c7
upstream: 3e8081ebff12bec1347deaceb6bce0765cce54df
6a591f347a7c201678a3932d5a2ebc08f6fbf50a:
title: 'netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()'
mainline: 9c778fe48d20ef362047e3376dee56d77f8500d4
upstream: 872eca64c3267dbc5836b715716fc6c03a18eda7
5489a0e446410516b104e0dbc7901cf96ca0d3e9:
title: 'net: qrtr: Update packets cloning when broadcasting'
mainline: f011b313e8ebd5b7abd8521b5119aecef403de45
upstream: 7f02a7d8a2890678f0bfd563eb99dd31bafc36eb
6ada46e520db9db21909d1333f2d1f11d0ea47d8:
title: 'netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS'
mainline: e1f1ee0e9ad8cbe660f5c104e791c5f1a7cf4c31
upstream: b14c58e37050703568ab498404018294807209a5
24ee879c5a39f2f8e92ef5dc6b82ad71890af0b9:
title: 'crypto: aead,cipher - zeroize key buffer after use'
mainline: 23e4099bdc3c8381992f9eb975c79196d6755210
upstream: 89b9b6fa4463daf820e6a5ef65c3b0c2db239513
ad481d5cbb6fc4c2fbe847eaab398a667608aa41:
title: Remove *.orig pattern from .gitignore
mainline: 76be4f5a784533c71afbbb1b8f2963ef9e2ee258
upstream: e19774a171f108433e9fba98a7bfbf65ec2a18de
2903e604526b78ba231eff10d4d32eecc84b7d13:
title: 'soc: versatile: integrator: fix OF node leak in probe() error path'
mainline: 874c5b601856adbfda10846b9770a6c66c41e229
upstream: 6ab18d4ada166d38046ca8eb9598a3f1fdabd2b7
5b2fc11840b44e9989d9e931881108d56828398b:
title: 'USB: appledisplay: close race between probe and completion handler'
mainline: 8265d06b7794493d82c5c21a12d7ba43eccc30cb
upstream: 17720dd1be72e4cf5436883cf9d114d0c3e47d19
7fe54b4967d33e67db68d83c1126f160341fcf3a:
title: 'USB: misc: cypress_cy7c63: check for short transfer'
mainline: 49cd2f4d747eeb3050b76245a7f72aa99dbd3310
upstream: 638810fe9c0c15ffaa1b4129e54f1e8affb28afd
8265d9830ede6739edfeeac27d7d97fa2ff60f24:
title: 'tty: rp2: Fix reset with non forgiving PCIe host bridges'
mainline: f16dd10ba342c429b1e36ada545fb36d4d1f0e63
upstream: 279994e23d7e6d2a30f2cc7b7437fedccac0834d
29cbc0c5c3d689694a2de42d48938385c321d073:
title: 'drbd: Fix atomicity violation in drbd_uuid_set_bm()'
mainline: 2f02b5af3a4482b216e6a466edecf6ba8450fa45
upstream: b674f1b49f9eaec9aac5c64a75e535aa3f359af7
fa3bcef6588b3c2d861f5888dfe595d671bf790e:
title: 'drbd: Add NULL check for net_conf to prevent dereference in state validation'
mainline: a5e61b50c9f44c5edb6e134ede6fee8806ffafa9
upstream: 3b3ed68f695ee000e9c9fa536761a0554bfc1340
722db7a1dfcd05605e4fe31285eb51416a7c5f3f:
title: 'ACPI: sysfs: validate return type of _STR method'
mainline: 4bb1e7d027413835b086aed35bc3f0713bc0f72b
upstream: 92fd5209fc014405f63a7db79802ca4b01dc0c05
764b74ce49fcac9d4ce79f2382f5a72f7e4ce9ee:
title: 'f2fs: prevent possible int overflow in dir_block_index()'
mainline: 47f268f33dff4a5e31541a990dc09f116f80e61c
upstream: 60bffc6e6b32fb88e5c1234448de5ccf88b590f5
6e6800bf67a4f4d90bfeac9576562c4b94f86b4f:
title: 'f2fs: avoid potential int overflow in sanity_check_area_boundary()'
mainline: 50438dbc483ca6a133d2bce9d5d6747bcee38371
upstream: 24dfe070d6d05d62a00c41d5d52af5a448ae7af7
2b8c76dea7cd29cd76056aa1622f824203672a78:
title: 'vfs: fix race between evice_inodes() and find_inode()&iput()'
mainline: 88b1afbf0f6b221f6c5bb66cc80cd3b38d696687
upstream: 6cc13a80a26e6b48f78c725c01b91987d61563ef
6aec9a2b2ea68124ec578150968e918b714b4951:
title: 'nfs: fix memory leak in error path of nfs4_do_reclaim'
mainline: 8f6a7c9467eaf39da4c14e5474e46190ab3fb529
upstream: f239240d65807113e565226b8e0a7ea13390bff3
4d86dbe788e3493096e0ac52cb1d67da3a97f253:
title: 'PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()'
mainline: e56427068a8d796bb7b8e297f2b6e947380e383f
upstream: d957766954641b4bbd7e359d51206c0b940988a6
85f9e31d10684f30ee9dd7181101849d66bb46ea:
title: 'PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler'
mainline: 0199d2f2bd8cd97b310f7ed82a067247d7456029
upstream: ebf6629fcff1e04e43ef75bd2c2dbfb410a95870
a221ba7b5c10912b64ef3214f340d306a7f2f716:
title: 'soc: versatile: realview: fix memory leak during device remove'
mainline: 1c4f26a41f9d052f334f6ae629e01f598ed93508
upstream: 0accfec683c0a3e31c8ba738be0b0014e316d6a0
d8f64e84dd728d7c0b98963b34a5a8c3bf1cb3a9:
title: 'soc: versatile: realview: fix soc_dev leak during device remove'
mainline: c774f2564c0086c23f5269fd4691f233756bf075
upstream: b05605f5a42b4719918486e2624e44f3fa9e818f
763e7b56a44b2c0b2adf924cfdbe078001aa424d:
title: 'usb: yurex: Replace snprintf() with the safer scnprintf() variant'
mainline: 86b20af11e84c26ae3fde4dcc4f490948e3f8035
upstream: a2ac6cb8aaa2eb23209ffa641962dd62958522a1
4445f05310701e77940cd1105f380f29838acbe0:
title: 'USB: misc: yurex: fix race between read and write'
mainline: 93907620b308609c72ba4b95b09a6aa2658bb553
upstream: 1250cd9dee69ace62b9eb87230e8274b48bc9460
a7f890cc3d58e08cf2ec730b95376b94862c6576:
title: 'i2c: aspeed: Update the stop sw state when the bus recovery occurs'
mainline: 93701d3b84ac5f3ea07259d4ced405c53d757985
upstream: 16cfd59341f73157ef319c588e639fc1013d94cf
bdd844b72fada07b3849e5eea841181c97d16f3e:
title: 'i2c: isch: Add missed ''else'''
mainline: 1db4da55070d6a2754efeb3743f5312fc32f5961
upstream: bbe3396e96a2ee857cf2206784f06bc3f49ff240
a8e1dbee0dfa30fe4d52939c495d469541cf5c8f:
title: 'usb: yurex: Fix inconsistent locking bug in yurex_read()'
mainline: e7d3b9f28654dbfce7e09f8028210489adaf6a33
upstream: 709b0b70011b577bc78406e76c4563e10579ddad
198501d96c89d17a8ee79587f593537f2773aa07:
title: 'mailbox: rockchip: fix a typo in module autoloading'
mainline: e92d87c9c5d769e4cb1dd7c90faa38dddd7e52e3
upstream: ae2d6fdd49669f35ed3a1156a4aab66a37e6a450
07726a73bd9cdc1843231a43985b5d310ee37fb2:
title: 'mailbox: bcm2835: Fix timeout during suspend mode'
mainline: dc09f007caed3b2f6a3b6bd7e13777557ae22bfd
upstream: 4e1e03760ee7cc4779b6306867fe0fc02921b963
5f8a65de609aaf9a0ef037ca8110bc9a3361c6c4:
title: 'ceph: remove the incorrect Fw reference check when dirtying pages'
mainline: c08dfb1b49492c09cf13838c71897493ea3b424e
upstream: c26c5ec832dd9e9dcd0a0a892a485c99889b68f0
51f85acdf26900ae9d4b89f2a92b1aeb3c84cb5a:
title: 'netfilter: nf_tables: prevent nf_skb_duplicated corruption'
mainline: 92ceba94de6fb4cee2bf40b485979c342f44a492
upstream: 50067d8b3f48e4cd4c9e817d3e9a5b5ff3507ca7
d8d31cfbc82a0ae2e5ec55c7017ffbacc7f5fa4f:
title: 'r8152: Factor out OOB link list waits'
mainline: 5f71c84038d39def573744a145c573758f52a949
upstream: e8bed7c8845878f8c60e76f0a10d61ea2f709580
5f9dc86cd8db3619cde8c03030791e3785d57212:
title: 'net: ethernet: lantiq_etop: fix memory disclosure'
mainline: 45c0de18ff2dc9af01236380404bbd6a46502c69
upstream: 905f06a34f960676e7dc77bea00f2f8fe18177ad
e2c585677eacdc04469488dac62f2fed9e626fed:
title: 'ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs'
mainline: 1c801e7f77445bc56e5e1fec6191fd4503534787
upstream: a66828fdf8ba3ccb30204f7e44761007a7437a3a
3633a4341c2cea95f2294738f08398c864731ba8:
title: 'ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin'
mainline: b3ebb007060f89d5a45c9b99f06a55e36a1945b5
upstream: ba4ec41f6958bd5fc314b98c0ba17f5bb9a11375
e4ca685be5fe41db336a29877df4a012f919c6ae:
title: 'f2fs: Require FMODE_WRITE for atomic write ioctls'
mainline: 4f5a100f87f32cb65d4bb1ad282a08c92f6f591e
upstream: 700f3a7c7fa5764c9f24bbf7c78e0b6e479fa653
404a43ffc1ecfac85855f309721cc4000e9e9171:
title: 'wifi: ath9k: fix possible integer overflow in ath9k_get_et_stats()'
mainline: 3f66f26703093886db81f0610b97a6794511917c
upstream: 600f668453be81b25dcc2f20096eac2243aebdaa
1bb884ba1941c7a5cf9cf7cc4037f3c3a6b106d4:
title: 'wifi: ath9k_htc: Use __skb_set_length() for resetting urb before resubmit'
mainline: 94745807f3ebd379f23865e6dab196f220664179
upstream: e6b9bf32e0695e4f374674002de0527d2a6768eb
b8516592581c30f76def9221190dc9380f8da6c7:
title: 'net: hisilicon: hip04: fix OF node leak in probe()'
mainline: 17555297dbd5bccc93a01516117547e26a61caf1
upstream: 8c354ddfec8126ef58cdcde82dccc5cbb2c34e45
3d3fbd73239ca0d6f8e2965cd98982aecbaa79e8:
title: 'net: hisilicon: hns_dsaf_mac: fix OF node leak in hns_mac_get_info()'
mainline: 5680cf8d34e1552df987e2f4bb1bff0b2a8c8b11
upstream: 7df217a21b74e730db216984218bde434dffc34b
e07b666a56c1d67776a3189f4493afd19e050305:
title: 'net: hisilicon: hns_mdio: fix OF node leak in probe()'
mainline: e62beddc45f487b9969821fad3a0913d9bc18a2f
upstream: 963174dad7d4993ff3a4e1b43cefd296df0296b4
165bb61dc09819ee1c5f1a33fc9709f57b6cd5e2:
title: 'ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails'
mainline: 5accb265f7a1b23e52b0ec42313d1e12895552f4
upstream: b017675cfbd126954d3b45afbdd6ee345a0ce368
5d842b757d1a15ffb7abcd840bed276126302558:
title: 'ACPICA: Fix memory leak if acpi_ps_get_next_field() fails'
mainline: e6169a8ffee8a012badd8c703716e761ce851b15
upstream: 40fa60e0bf406ced3dfd857015dafdcd677a4929
e6f96efbe6713164a9656bc0b4fc70d17f253486:
title: 'ACPI: EC: Do not release locks during operation region accesses'
mainline: dc171114926ec390ab90f46534545420ec03e458
upstream: 8d5dd2d2ef6cc87799b4ff915e561814d3c35d2c
74270bedeea7735c0ba9518b3fee24181e0c6da2:
title: 'ACPICA: check null return of ACPI_ALLOCATE_ZEROED() in acpi_db_convert_to_package()'
mainline: a5242874488eba2b9062985bf13743c029821330
upstream: 4669da66ebc5b09881487f30669b0fcdb462188e
f5ce9568dc7b5120dbf2e74500c11266592afd7a:
title: 'tipc: guard against string buffer overrun'
mainline: 6555a2a9212be6983d2319d65276484f7c5f431a
upstream: 8298b6e45fb4d8944f356b08e4ea3e54df5e0488
5601f1cd6c89caede02c512aceba1122c1cb3883:
title: 'ipv4: Check !in_dev earlier for ioctl(SIOCSIFADDR).'
mainline: e3af3d3c5b26c33a7950e34e137584f6056c4319
upstream: 098a9b686df8c560f5f7683a1a388646aae0f023
87987dd1f838cdbb660e1ec61ec971fd2f9ea6aa:
title: 'ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family'
mainline: 8fed54758cd248cd311a2b5c1e180abef1866237
upstream: 05905659e2591368b50eaa79d94c75aeb18c46ef
3b69e39d186eea8fc7e7be3ce493386062cfa847:
title: 'ACPICA: iasl: handle empty connection_node'
mainline: a0a2459b79414584af6c46dd8c6f866d8f1aa421
upstream: ea69502703bd3c38c3f016f8b6614ef0de2b94c2
86713ec5023b52e2c29abf8d15dbd59318bc1ea0:
title: 'wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()'
mainline: 498365e52bebcbc36a93279fe7e9d6aec8479cee
upstream: b55c8848fdc81514ec047b2a0ec782ffe9ab5323
62fda267887348a38a2931739e43e3c3cf22f7ab:
title: 'signal: Replace BUG_ON()s'
mainline: 7f8af7bac5380f2d95a63a6f19964e22437166e1
upstream: 0f9c27fbb8a52c50ff7d2659386f1f43e7fbddee
26883705cb402fecd342e313afc02958f3c4c9e2:
title: 'ALSA: asihpi: Fix potential OOB array access'
mainline: 7b986c7430a6bb68d523dac7bfc74cbd5b44ef96
upstream: a6bdb691cf7b66dcd929de1a253c5c42edd2e522
8835daf1e8994a559b89b4935218a7f9f0edefb2:
title: 'ALSA: hdsp: Break infinite MIDI input flush loop'
mainline: c01f3815453e2d5f699ccd8c8c1f93a5b8669e59
upstream: dc0c68e2e6e2c544b1361baa1ca230569ab6279d
5c788f3e00af8da7b9e127980d0d782713d0ac6b:
title: 'fbdev: pxafb: Fix possible use after free in pxafb_task()'
mainline: 4a6921095eb04a900e0000da83d9475eb958e61e
upstream: e657fa2df4429f3805a9b3e47fb1a4a1b02a72bd
c44e3d43c84de7db15a4743c5683c5cef64e986e:
title: 'power: reset: brcmstb: Do not go into infinite loop if reset fails'
mainline: cf8c39b00e982fa506b16f9d76657838c09150cb
upstream: 61a6d482734804e0a81c3951b8a0d3852085a2cc
c9591bc1d6b4f3722215d12cc1626f04783b63bf:
title: 'ata: sata_sil: Rename sil_blacklist to sil_quirks'
mainline: 93b0f9e11ce511353c65b7f924cf5f95bd9c3aba
upstream: a57a97bb79d5123442068f887e5f1614ed4c752c
ac92419af8e1b7f89db62054d06b3be6baa5bb41:
title: 'jfs: UBSAN: shift-out-of-bounds in dbFindBits'
mainline: b0b2fc815e514221f01384f39fbfbff65d897e1c
upstream: 830d908130d88745f0fd3ed9912cc381edf11ff1
79bf2ab235866b9421e5606ebed6984c19f2e0ae:
title: 'jfs: Fix uaf in dbFreeBits'
mainline: d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234
upstream: 4ac58f7734937f3249da734ede946dfb3b1af5e4
232dea142d9e232619aff122916b326975dd2511:
title: 'jfs: check if leafidx greater than num leaves per dmap tree'
mainline: d64ff0d2306713ff084d4b09f84ed1a8c75ecc32
upstream: d76b9a4c283c7535ae7c7c9b14984e75402951e1
643f01f400ff296cd1263fcd1896e261b64ed1c6:
title: 'jfs: Fix uninit-value access of new_ea in ea_buffer'
mainline: 2b59ffad47db1c46af25ccad157bb3b25147c35c
upstream: 7b24d41d47a6805c45378debf8bd115675d41da8
4e150b2ed11f1ce7bfe2e243637886862eda74d3:
title: 'drm/radeon/r100: Handle unknown family in r100_cp_init_microcode()'
mainline: c6dbab46324b1742b50dc2fb5c1fee2c28129439
upstream: 7d91358e819a2761a5feff67d902456aaf4e567a
c19d34cfa203f3c75b5e25a6f657cb4a8adf372e:
title: 'of/irq: Refer to actual buffer size in of_irq_parse_one()'
mainline: 39ab331ab5d377a18fbf5a0e0b228205edfcc7f4
upstream: 64bf240f2dfc242d507c7f8404cd9938d61db7cc
9d2a9cdceb4ae4c4bd1ee308052de6f10602cb15:
title: 'ext4: ext4_search_dir should return a proper error'
mainline: cd69f8f9de280e331c9e6ff689ced0a688a9ce8f
upstream: a15514ec9f080fe24ee71edf8b97b49ab9b8fc80
6982e3324dbcc51b1cec4f5488fc6a0bbf7be4ad:
title: 'ext4: fix i_data_sem unlock order in ext4_ind_migrate()'
mainline: cc749e61c011c255d81b192a822db650c68b313f
upstream: 4192adefc9c570698821c5eb9873320eac2fcbf1
19730760522e21af34cdab871e3908e7b7dc8521:
title: 'spi: s3c64xx: fix timeout counters in flush_fifo'
mainline: 68a16708d2503b6303d67abd43801e2ca40c208d
upstream: 12f47fdd4fb4c4592c9cfad6c21b3855a6bdadb8
1fad7228e67992a1b120ff76b4887190ca32e8f6:
title: 'selftests: breakpoints: use remaining time to check if suspend succeed'
mainline: c66be905cda24fb782b91053b196bd2e966f95b7
upstream: 8dea5ffbd147f6708e2f70f04406d8b711873433
e8219bced027378a40a33c1044eca3135db5e83d:
title: 'selftests: vDSO: fix vDSO symbols lookup for powerpc64'
mainline: ba83b3239e657469709d15dcea5f9b65bf9dbf34
upstream: 058d587e7f1520934823bae8f41db3c0b1097b59
e9851b22b5a7211b32db852c9e6a6910230faebf:
title: 'i2c: xiic: Wait for TX empty to avoid missed TX NAKs'
mainline: 521da1e9225450bd323db5fa5bca942b1dc485b7
upstream: 8a6158421b417bb0841c4c7cb7a649707a1089d2
e8c0b2c2e4064aa5e3f7fdb517265f788156fdc3:
title: 'spi: bcm63xx: Fix module autoloading'
mainline: 909f34f2462a99bf876f64c5c61c653213e32fce
upstream: 54feac119535e0273730720fe9a4683389f71bff
7a6139e316c9dd16f9f3dcf8a225ddfbe487c6db:
title: 'perf/core: Fix small negative period being ignored'
mainline: 62c0b1061593d7012292f781f11145b2d46f43ab
upstream: 7fddba7b1bb6f1cc35269e510bc832feb3c54b11
38e7f1b9fd9e1f67d748242d07a430c85f9024a8:
title: 'ALSA: core: add isascii() check to card ID generator'
mainline: d278a9de5e1837edbe57b2f1f95a104ff6c84846
upstream: 3b9b0efb330f9d2ab082b7f426993d7bac3f2c66
9e7a4c15b80cc0547d89230298eb7d9e71afb999:
title: 'ext4: no need to continue when the number of entries is 1'
mainline: 1a00a393d6a7fb1e745a41edd09019bd6a0ad64c
upstream: 64c8c484242b141998f7408596ddb2dc6da4b1d3
ffe3a60234391b1045ee3ed64896bf14da3613b3:
title: 'ext4: propagate errors from ext4_find_extent() in ext4_insert_range()'
mainline: 369c944ed1d7c3fb7b35f24e4735761153afe7b3
upstream: d38a882fadb0431747342637ad3a9166663e8a86
d493509e9bd943f52ecb658bce751a5665491843:
title: 'ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()'
mainline: 972090651ee15e51abfb2160e986fa050cfc7a40
upstream: 330ecdae721e62cd7ee287fb3cd7f88afa26e85a
5ddb510c87c40bf7bc87aa90c9e6689970ea7733:
title: 'ext4: aovid use-after-free in ext4_ext_insert_extent()'
mainline: a164f3a432aae62ca23d03e6d926b122ee5b860d
upstream: e17ebe4fdd7665c93ae9459ba40fcdfb76769ac1
47c536f76d494c3b5e14839b5857c8f8dbba1242:
title: 'ext4: fix double brelse() the buffer of the extents path'
mainline: dcaa6c31134c0f515600111c38ed7750003e1b9c
upstream: d4574bda63906bf69660e001470bfe1a0ac524ae
5a0581e18a4b83fc0931a64224872c539457d2cd:
title: 'ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()'
mainline: dd589b0f1445e1ea1085b98edca6e4d5dedb98d0
upstream: 93fd249f197eeca81bb1c744ac8aec2804afd219
c87ca927b9e3d847d7c44ecf9f07528f1ef033e4:
title: 'of/irq: Support #msi-cells=<0> in of_msi_get_domain'
mainline: db8e81132cf051843c9a59b46fa5a071c45baeb3
upstream: 030de6c36c48a40f42d7d59732ee69990340e0a1
d3355be0380a6ec95a835e359a68d4f42af056b8:
title: 'jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error'
mainline: f5cacdc6f2bb2a9bf214469dd7112b43dd2dd68a
upstream: 801a35dfef6996f3d5eaa96a59caf00440d9165e
0835b9f76d8069704f9620b14593572fb33fc20a:
title: 'ocfs2: fix the la space leak when unmounting an ocfs2 volume'
mainline: dfe6c5692fb525e5e90cefe306ee0dffae13d35f
upstream: 5a074861ae1b6262b50fa9780957db7d17b86672
74930aa28c3a2c7c23718c81400a79bb362bc740:
title: 'ocfs2: fix uninit-value in ocfs2_get_block()'
mainline: 2af148ef8549a12f8025286b8825c2833ee6bcb8
upstream: e95da10e6fcac684895c334eca9d95e2fd10b0fe
760f46ded0728ed84afb0a9859c89b0f92dca609:
title: 'ocfs2: reserve space for inline xattr before attaching reflink tree'
mainline: 5ca60b86f57a4d9648f68418a725b3a7de2816b0
upstream: 5c9807c523b4fca81d3e8e864dabc8c806402121
a03082a35421c27be3c50fe1d15abf899546cc66:
title: 'ocfs2: cancel dqi_sync_work before freeing oinfo'
mainline: 35fccce29feb3706f649726d410122dd81b92c18
upstream: fc5cc716dfbdc5fd5f373ff3b51358174cf88bfc
1ca500197bcc7e1e485788aed1dacdfb9f973ff9:
title: 'ocfs2: remove unreasonable unlock in ocfs2_read_blocks'
mainline: c03a82b4a0c935774afa01fd6d128b444fd930a1
upstream: 5245f109b4afb6595360d4c180d483a6d2009a59
c3bd19a739dcaaae0cbab86f0c0b0b27eda93601:
title: 'ocfs2: fix null-ptr-deref when journal load failed.'
mainline: 5784d9fcfd43bd853654bb80c87ef293b9e8e80a
upstream: fd89d92c1140cee8f59de336cb37fa65e359c123
ae8eab265d15a47a46d1c6b58a75d801814cb86c:
title: 'ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate'
mainline: 33b525cef4cff49e216e4133cc48452e11c0391e
upstream: 190d98bcd61117a78fe185222d162180f061a6ca
fb101f7fce16d22e18b8bf9fa9d13373f38536e6:
title: 'clk: rockchip: fix error for unknown clocks'
mainline: 12fd64babaca4dc09d072f63eda76ba44119816a
upstream: 2f1e1a9047b1644d05284fc0da1d6ab9c4434cf6
62369afcf4db28d2c18ed331f75448c97ee53bac:
title: 'media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags'
mainline: 599f6899051cb70c4e0aa9fd591b9ee220cb6f14
upstream: 4afab2197e530b480c4cc099255d12a08c6a1f93
66dd5129c4b2756157ab65da5826aba26c3adc1d:
title: 'media: venus: fix use after free bug in venus_remove due to race condition'
mainline: c5a85ed88e043474161bbfe54002c89c1cb50ee2
upstream: 5098b9e6377577fe13d03e1d8914930f014a3314
8eafd43568c906c485c18f684d67a19ec2e4edcd:
title: 'iio: magnetometer: ak8975: Fix reading for ak099xx sensors'
mainline: 129464e86c7445a858b790ac2d28d35f58256bbe
upstream: 2e78095a0cc35d6210de051accb2fe45649087cd
f24bdf3d0d8335026c719db068c6472acbf0839d:
title: 'tomoyo: fallback to realpath if symlink''s pathname does not exist'
mainline: ada1986d07976d60bed5017aa38b7f7cf27883f7
upstream: 455246846468503ac739924d5b63af32c6261b31
bd7cd397ff7943c113c695eb7cd40b4b6afc06bc:
title: 'Input: adp5589-keys - fix adp5589_gpio_get_value()'
mainline: c684771630e64bc39bddffeb65dd8a6612a6b249
upstream: 9ff7ae486d51c0da706a29b116d7fa399db677f5
3fd6acda2f9ff74d3281d09cc1ce73e4ad65c469:
title: 'btrfs: wait for fixup workers before stopping cleaner kthread during umount'
mainline: 41fd1e94066a815a7ab0a7025359e9b40e4b3576
upstream: cd686dfff63f27d712877aef5b962fbf6b8bc264
1acfbc7cdb47b0749f0cd34c0f2b622127307b1b:
title: 'gpio: davinci: fix lazy disable'
mainline: 3360d41f4ac490282fddc3ccc0b58679aa5c065d
upstream: e9b751c0d7abde1837ee1510cbdc705570107ef1
57d9a27da5d76dde393792654826c5371b51c77b:
title: 'arm64: Add Cortex-715 CPU part definition'
mainline: 07e39e60bbf0ccd5f895568e1afca032193705c0
upstream: 3781b92af63e7a53805e105875d4dace65bcefef
0a56f80bfe3292c9e87a85762ac9693abadec8c5:
title: 'uprobes: fix kernel info leak via "[uprobes]" vma'
mainline: 34820304cc2cd1804ee1f8f3504ec77813d29c8e
upstream: f31f92107e5a8ecc8902705122c594e979a351fe
2c85a79aba7b7724ff506258d04032d4f1b4f503:
title: 'nfsd: use ktime_get_seconds() for timestamps'
mainline: b3f255ef6bffc18a28c3b6295357f2a3380c033f
upstream: f81fcf39509d30cb5f1c659099c1d8f0c2a9a57a
2002a57e83b51260eb9de16d0935c7291c203c13:
title: 'nfsd: fix delegation_blocked() to block correctly for at least 30 seconds'
mainline: 45bb63ed20e02ae146336412889fe5450316a84f
upstream: ccbd18223985635b8dbb1393bacac9e1a5fa3f2f
36949604b7d7db06dd36f3871bf9c2d6a06d8b89:
title: 'ext4: fix inode tree inconsistency caused by ENOMEM'
mainline: 3f5424790d4377839093b68c12b130077a4e4510
upstream: eea5a4e7fe4424245aeba77bb0f24a38a1bead16
825559c99e1897b27fe9034af05c2d4febcf50e2:
title: 'tracing: Remove precision vsnprintf() check from print event'
mainline: 5efd3e2aef91d2d812290dcb25b2058e6f3f532c
upstream: f3de4b5d1ab8139aee39cc8afbd86a2cf260ad91
c69c205a6a13dbe8ff4f2b65ce5170a4e182edae:
title: 'virtio_console: fix misc probe bugs'
mainline: b9efbe2b8f0177fa97bfab290d60858900aa196b
upstream: 42a7c0fd6e5b7c5db8af8ab2bab6eff2a723b168
fe91966767513b8ae7f637bfc2c2fb68636a37dc:
title: 's390/facility: Disable compile time optimization for decompressor code'
mainline: 0147addc4fb72a39448b8873d8acdf3a0f29aa65
upstream: f559306a168fb92a936beaa1f020f5c45cdedac6
cc84719d9b691915a4fde154667d84e2ad74a0c9:
title: 's390/mm: Add cond_resched() to cmm_alloc/free_pages()'
mainline: 131b8db78558120f58c5dc745ea9655f6b854162
upstream: a12b82d741350b89b4df55fa8a4e5c0579d919cb
0c92a05a334ec247c1c27ecfd35705b865a2eb5d:
title: 'ext4: nested locking for xattr inode'
mainline: d1bc560e9a9c78d0b2314692847fc8661e0aeb99
upstream: c0f57dd0f1603ae27ef694bacde66147f9d57d32
2ac0320e88b9c9005998c2e3b5734f7961070cc6:
title: 'clk: bcm: bcm53573: fix OF node leak in init'
mainline: f92d67e23b8caa81f6322a2bad1d633b00ca000e
upstream: 8ac316aed34fa1a49ebbaa93465bf8bfe73e9937
98450b5f38eb8a75e2b40b3174bc00600347d329:
title: 'i2c: i801: Use a different adapter-name for IDF adapters'
mainline: 43457ada98c824f310adb7bd96bd5f2fcd9a3279
upstream: a2eb6e5a03de2ecbba68384c1c8f2a34c89ed7b8
3df84428b103d405f250cfdf5936537dedc7c2fd:
title: 'media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put()'
mainline: 6a9c97ab6b7e85697e0b74e86062192a5ffffd99
upstream: 940e83f377cb3863bd5a4e483ef1b228fbc86812
fffec2079f8107bb33fd1a1928239c142510aa2f:
title: 'usb: chipidea: udc: enable suspend interrupt after usb reset'
mainline: e4fdcc10092fb244218013bfe8ff01c55d54e8e4
upstream: 93233aa73b3ac373ffd4dd9e6fb7217a8051b760
ca910899b554f8d476bcf4b14980f8845269e742:
title: 'tools/iio: Add memory allocation failure check for trigger_name'
mainline: 3c6b818b097dd6932859bcc3d6722a74ec5931c1
upstream: e0daff560940b0d370d4328b9ff9294b7f893daa
a22a1046d7d1b88568ba8da927e821b4f0babaac:
title: 'driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute'
mainline: c0fd973c108cdc22a384854bc4b3e288a9717bb2
upstream: aca863154863d0a97305a089399cee1d39e852da
ef5963eabdc48181eee93f7233f433cc2a588ea2:
title: 'fbdev: sisfb: Fix strbuf array overflow'
mainline: 9cf14f5a2746c19455ce9cb44341b5527b5e19c3
upstream: 433c84c8495008922534c5cafdae6ff970fb3241
5e4b995a3aca9fdd2272546ec5667c32747443f4:
title: 'tcp: fix tcp_enter_recovery() to zero retrans_stamp when it''s safe'
mainline: b41b4cbd9655bcebcce941bef3601db8110335be
upstream: a58878d7106b229a2d91a647629a0a7bedccaa8a
29037061623d008c997450f67e5b5d05f756bb7c:
title: 'netfilter: br_netfilter: fix panic with metadata_dst skb'
mainline: f9ff7665cd128012868098bbd07e28993e314fdb
upstream: f07131239a76cc10d5e82c19d91f53cb55727297
648c574af6e92af84ebd54f3d8044c21ae820655:
title: 'Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change'
mainline: 08d1914293dae38350b8088980e59fbc699a72fe
upstream: b77b3fb12fd483cae7c28648903b1d8a6b275f01
55a6946bb46cdc7b528dfbd30bb2fb2376525619:
title: 'gpio: aspeed: Add the flush write to ensure the write complete.'
mainline: 1bb5a99e1f3fd27accb804aa0443a789161f843c
upstream: 8c4d52b80f2d9dcc5053226ddd18a3bb1177c8ed
5a801c62a51b1c210698f59e40aa5417f071d7fc:
title: 'igb: Do not bring the device up after non-fatal error'
mainline: 330a699ecbfc9c26ec92c6310686da1230b4e7eb
upstream: dca2ca65a8695d9593e2cf1b40848e073ad75413
1fde287fcb280b7ae6a4a0b3edc99dc455a5c30d:
title: 'net: ibm: emac: mal: fix wrong goto'
mainline: 08c8acc9d8f3f70d62dd928571368d5018206490
upstream: 4bd7823cacb21e32f3750828148ed5d18d3bf007
cebdbf6f73b01661300d39d2064f6d5c69f24f8d:
title: 'ppp: fix ppp_async_encode() illegal access'
mainline: 40dddd4b8bd08a69471efd96107a4e1c73fabefc
upstream: 4151ec65abd755133ebec687218fadd2d2631167
a5b30e4f682b2971d4455afa1b3d3531d37534e6:
title: 'CDC-NCM: avoid overflow in sanity checking'
mainline: 8d2b1a1ec9f559d30b724877da4ce592edc41fdc
upstream: a612395c7631918e0e10ea48b9ce5ab4340f26a6
35af89640d1d44ff6c7973922c43c4f5b83af8b9:
title: 'HID: plantronics: Workaround for an unexcepted opposite volume key'
mainline: 87b696209007b7c4ef7bdfe39ea0253404a43770
upstream: b1ce11ce52359eefa7bc33be13e946a7154fd35f
93cddf4d4c509f0ec53017297294d0a302ffd0da:
title: 'Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"'
mainline: 71c717cd8a2e180126932cc6851ff21c1d04d69a
upstream: 6f8f23390160355a4a571230986d524fd3929c2a
dc89df53f4c97dedfcb4568191037e3ebeef159d:
title: 'usb: xhci: Fix problem with xhci resume from suspend'
mainline: d44238d8254a36249d576c96473269dbe500f5e4
upstream: 52e998173cfed7d6953b3185f2da174712ce4a8f
b742600e3e092e2857196e7173387925a5111631:
title: 'usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip'
mainline: a6555cb1cb69db479d0760e392c175ba32426842
upstream: 7a8df891d679d6627d91e334a734578ca16518eb
44dcccd712b6d2c691634dfd49fa5903ad691fc8:
title: 'net: Fix an unsafe loop on the list'
mainline: 1dae9f1187189bc09ff6d25ca97ead711f7e26f9
upstream: 464801a0f6ccb52b21faa33bac6014fd74cc5e10
d669e5f7d2c8746e3ed062d73b9426fb09039573:
title: 'posix-clock: Fix missing timespec64 check in pc_clock_settime()'
mainline: d8794ac20a299b647ba9958f6d657051fc51a540
upstream: 29f085345cde24566efb751f39e5d367c381c584
7d6f8b1d7746e0b3269b0e61c8d374d09a6b771b:
title: 'arm64: probes: Remove broken LDR (literal) uprobe support'
mainline: acc450aa07099d071b18174c22a1119c57da8227
upstream: cc86f2e9876c8b5300238cec6bf0bd8c842078ee
ed1774c811054dd8ff235b4830782572676f7b00:
title: 'arm64: probes: Fix simulate_ldr*_literal()'
mainline: 50f813e57601c22b6f26ced3193b9b94d70a2640
upstream: 19f4d3a94c77295ee3a7bbac91e466955f458671
9b9e89aeb9b0df1de45bb186662572a1b8b921e4:
title: 'PCI: Add function 0 DMA alias quirk for Glenfly Arise chip'
mainline: 9246b487ab3c3b5993aae7552b7a4c541cc14a49
upstream: 029efe3b57d981b0c239e50f3513838cae121578
5a2b55312783d9a4f60898793dd5aadea0360504:
title: 'fat: fix uninitialized variable'
mainline: 963a7f4d3b90ee195b895ca06b95757fcba02d1a
upstream: 09b2d2a2267187336b446f4c08e6204c30688bcf
70b388b0efb874251eee3df2059246413ee623e7:
title: 'KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()'
mainline: 49f683b41f28918df3e51ddc0d928cb2e934ccdb
upstream: 11a772d5376aa6d3e2e69b5b5c585f79b60c0e17
b291c7c1eed423874cdbc28d717d0f4944b4b0fc:
title: 's390/sclp_vt220: Convert newlines to CRLF instead of LFCR'
mainline: dee3df68ab4b00fff6bdf9fc39541729af37307c
upstream: ce6924fdafb09a7231ecfcea119b4e4c83023c97
4386af4473d15479b5c96b9941faf351b614bfbb:
title: 'KVM: s390: Change virtual to physical address access in diag 0x258 handler'
mainline: cad4b3d4ab1f062708fff33f44d246853f51e966
upstream: a9dee098c6931dfd75abe015b04c1c66fa1507f6
67d246dc91071f9cc960c2f6f969857bb2922c7f:
title: 'x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET'
mainline: ff898623af2ed564300752bba83a680a1e4fec8d
upstream: 9e460c6c7c8b72c4c23853627789c812fd2c3cf5
bc865c54ef9ef2e2ef7097787e63ed03b1d5b6bc:
title: 'drm/vmwgfx: Handle surface check failure correctly'
mainline: 26498b8d54373d31a621d7dec95c4bd842563b3b
upstream: f924af529417292c74c043c627289f56ad95a002
76b3e6598c2a4f5ecf6ae67f03f4fb0f85f90a61:
title: 'iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig'
mainline: 27b6aa68a68105086aef9f0cb541cd688e5edea8
upstream: 842911035eb20561218a0742f3e54e7978799c6a
6e6aa73932d86ce5335cdb2e50f9c9c46ad85b53:
title: 'iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()'
mainline: 3a29b84cf7fbf912a6ab1b9c886746f02b74ea25
upstream: 485744b5bd1f15a3ce50f70af52a9d68761c57dd
abf9b8555e8b720496841609025a6c9aa1a9188f:
title: 'iio: light: opt3001: add missing full-scale range value'
mainline: 530688e39c644543b71bdd9cb45fdfb458a28eaa
upstream: 4401780146a19d65df6f49d5273855f33c9c0a35
edc69f40262617c7257c732edc12d613a9687e86:
title: 'Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001'
mainline: 2c1dda2acc4192d826e84008d963b528e24d12bc
upstream: e32ae4a12628bb2c1046715f47ea7d57fc2b9cbf
98205e0fb61135f36e438d637862d78061396814:
title: 'xhci: Fix incorrect stream context type macro'
mainline: 6599b6a6fa8060145046d0744456b6abdb3122a7
upstream: e76b961d32fd94c7af80bc0ea35e345f1f838c59
14f0ba83331cb218f676f0cf81cda64c290c3ed4:
title: 'USB: serial: option: add support for Quectel EG916Q-GL'
mainline: 540eff5d7faf0c9330ec762da49df453263f7676
upstream: cdb2c8b31ea3ba692c9ab213369b095e794c8f39
1128e72fca7832afc143680fe12d0c938b3270d7:
title: 'USB: serial: option: add Telit FN920C04 MBIM compositions'
mainline: 6d951576ee16430822a8dee1e5c54d160e1de87d
upstream: 20cc2b146a8748902a5e4f5aa70457f48174b5c4
f3fce0c6ccd5abc38c912f3233df450af041b90c:
title: 'parport: Proper fix for array out-of-bounds access'
mainline: 02ac3a9ef3a18b58d8f3ea2b6e46de657bf6c4f9
upstream: 8aadef73ba3b325704ed5cfc4696a25c350182cf
adeaa3e2c7e54bbd83852d8e302ca76d7a1f256d:
title: 'x86/apic: Always explicitly disarm TSC-deadline timer'
mainline: ffd95846c6ec6cf1f93da411ea10d504036cab42
upstream: e75562346cac53c7e933373a004b1829e861123a
4ff716b2bb631baecc1eb6eca17a3d23b2850ad7:
title: 'nilfs2: propagate directory read errors from nilfs_find_entry()'
mainline: 08cfa12adf888db98879dbd735bc741360a34168
upstream: bb857ae1efd3138c653239ed1e7aef14e1242c81
85ee27f8ef66432d98e386248c7d8fa90a092b9d:
title: 'RDMA/bnxt_re: Fix incorrect AVID type in WQE structure'
mainline: 9ab20f76ae9fad55ebaf36bdff04aea1c2552374
upstream: 3e98839514a883188710c5467cf3b62a36c7885a
6371ff58cca7cd85a5f875a9e08b51f3bfa55a6e:
title: 'RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP'
mainline: c659b405b82ead335bee6eb33f9691bf718e21e8
upstream: 361576c9d34bd16b089864545073db383e372ba8
093416fbc1a9209422cb76784577eae3430a207d:
title: 'RDMA/bnxt_re: Return more meaningful error'
mainline: 98647df0178df215b8239c5c365537283b2852a6
upstream: 8fb8f613a904d3ccf61fa824a95f2fa2c3b8f191
e28fdf954db36a46cba23d2fe2d01635cca2063f:
title: 'net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()'
mainline: cf57b5d7a2aad456719152ecd12007fe031628a3
upstream: 7517c13ae14dac758e4ec0d881e463a8315bbc7d
69215607dc1760d491ac751b05456a18b8adf01d:
title: 'net: systemport: fix potential memory leak in bcm_sysport_xmit()'
mainline: c401ed1c709948e57945485088413e1bb5e94bd1
upstream: 8e81ce7d0166a2249deb6d5e42f28a8b8c9ea72f
e0a01897a0cdcee042136aa737dda898b2a2cb60:
title: 'Bluetooth: bnep: fix wild-memory-access in proto_unregister'
mainline: 64a90991ba8d4e32e3173ddd83d0b24167a5668c
upstream: e232728242c4e98fb30e4c6bedb6ba8b482b6301
644ca3d02eed5d09144291c2700a14cb2183bc0d:
title: arm64:uprobe fix the uprobe SWBP_INSN in big-endian
mainline: 60f07e22a73d318cddaafa5ef41a10476807cc07
upstream: 8fd414d25465bb666c71b5490fa939411e49228b
e33413f73e839b4c49efa91f2a26d4fde33361e4:
title: 'arm64: probes: Fix uprobes for big-endian kernels'
mainline: 13f8f1e05f1dc36dbba6cba0ae03354c0dafcde7
upstream: b6a638cb600e13f94b5464724eaa6ab7f3349ca2
531aa0f03b79233bfcfe6e067b0b04a0e8494817:
title: 'jfs: Fix sanity check in dbMount'
mainline: 67373ca8404fe57eb1bb4b57f314cff77ce54932
upstream: ea462ee11dbc4eb779146313d3abf5e5187775e1
db382d47beb9d7e9c0d27f0c5d866b67148ca799:
title: 'net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()'
mainline: 2cb3f56e827abb22c4168ad0c1bbbf401bb2f3b8
upstream: 137010d26dc5cd47cd62fef77cbe952d31951b7a
9f21e06d2a8bb717e49f8ef4a96672f939380c03:
title: 'be2net: fix potential memory leak in be_xmit()'
mainline: e4dd8bfe0f6a23acd305f9b892c00899089bd621
upstream: 941026023c256939943a47d1c66671526befbb26
2ca8893515d6c0360b38a5ebb726322c28f2585e:
title: 'net: usb: usbnet: fix name regression'
mainline: 8a7d12d674ac6f2147c18f36d1e15f1a48060edf
upstream: 8f83f28d93d380fa4083f6a80fd7793f650e5278
d792e0c744f67188b6e873a2dd188f1f03dc4f3b:
title: 'posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()'
mainline: 6e62807c7fbb3c758d233018caf94dfea9c65dbd
upstream: d005400262ddaf1ca1666bbcd1acf42fe81d57ce
9612b486b817fa6fc19b8fe9a81bd547c476e6c6:
title: 'nilfs2: fix kernel bug due to missing clearing of buffer delay flag'
mainline: 6ed469df0bfbef3e4b44fca954a781919db9f7ab
upstream: 033bc52f35868c2493a2d95c56ece7fc155d7cb3
8877c26f575b56ea80275c39aeb6e9ae85aafad1:
title: 'arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning'
mainline: ef08c0fadd8a17ebe429b85e23952dac3263ad34
upstream: 974955b61fe226c0d837106738fc0fb5910d67a8
7ca707ec81d8be129613f262fbffe9e15d327167:
title: 'xfrm: validate new SA''s prefixlen using SA family when sel.family is unset'
mainline: 3f0ab59e6537c6a8f9e1b355b48f9c05a76e8563
upstream: f31398570acf0f0804c644006f7bfa9067106b0a
db7bbe2185d31a31d50702582589d967d016587e:
title: 'cgroup: Fix potential overflow issue when checking max_depth'
mainline: 3cc4e13bb1617f6a13e5e6882465984148743cf4
upstream: 339df130db47ae7e89fddce5729b0f0566405d1d
38b579881e78d85e81e8625fb057a96e45b3adc6:
title: 'wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys'
mainline: 52009b419355195912a628d0a9847922e90c348c
upstream: c9cf9510970e5b33e5bc21377380f1cf61685ed0
ebfd3809b08074d25f038a1300971645bbe98b5b:
title: 'gtp: simplify error handling code in ''gtp_encap_enable()'''
mainline: b289ba5e07105548b8219695e5443d807a825eb8
upstream: 66f635f6ae87c35bd1bda16927e9393cacd05ee4
7f3a3eeed91e7c7bff96403270e2471fd29873b2:
title: 'gtp: allow -1 to be specified as file description from userspace'
mainline: 7515e37bce5c428a56a9b04ea7e96b3f53f17150
upstream: 63d8172188c759c44cae7a57eece140e0b90a2e1
69fcd1905bea29c01c7a659aa16268f2b40ebce8:
title: 'net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT'
mainline: 2e95c4384438adeaa772caa560244b1a2efef816
upstream: e7f9a6f97eb067599a74f3bcb6761976b0ed303e
a829200ea0a4ce6e889bf23df1bfbee34daf9746:
title: 'net: support ip generic csum processing in skb_csum_hwoffload_help'
mainline: 62fafcd63139920eb25b3fbf154177ce3e6f3232
upstream: 2c88668d57735d4ff65ce35747c8aa6662cc5013
d2216921d39819c8ba0f48dc6fd2c15e6290b6cd:
title: 'net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension'
mainline: 04c20a9356f283da623903e81e7c6d5df7e4dc3c
upstream: bcefc3cd7f592a70fcbbbfd7ad1fbc69172ea78b
51fb462970ebd4757675ab968175a3047847fa1d:
title: 'netfilter: nft_payload: sanitize offset and length before calling skb_checksum()'
mainline: d5953d680f7e96208c29ce4139a0e38de87a57fe
upstream: a661ed364ae6ae88c2fafa9ddc27df1af2a73701
3551df53194d0dfd74258bea61b7f82b3b97105e:
title: 'net: amd: mvme147: Fix probe banner message'
mainline: 82c5b53140faf89c31ea2b3a0985a2f291694169
upstream: 34f2d9975aff5ddb9e15e4ddd58528c8fd570c4a
5a9eb453112676da334380bda6fb9e7b126d04d9:
title: 'misc: sgi-gru: Don''t disable preemption in GRU driver'
mainline: b983b271662bd6104d429b0fd97af3333ba760bf
upstream: 88a0888162b375d79872fb1dece834bebea76fe3
6fb928dc4510f0382b79a2960b0c8fae57c76a33:
title: 'usb: phy: Fix API devm_usb_put_phy() can not release the phy'
mainline: fdce49b5da6e0fb6d077986dec3e90ef2b094b50
upstream: 3a5693be9a47d368d39fee08325f5bf6cdd2ebaf
b166e22b1f580bef5d1b09e00de9d718d7bb2eeb:
title: 'xhci: Fix Link TRB DMA in command ring stopped completion event'
mainline: 075919f6df5dd82ad0b1894898b315fbb3c29b84
upstream: d55d92597b7143f70e2db6108dac521d231ffa29
6a8dc3623eedca5d2fe8ac115d05cdf0e7def887:
title: 'Revert "driver core: Fix uevent_show() vs driver detach race"'
mainline: 9a71892cbcdb9d1459c84f5a4c722b14354158a5
upstream: fe10c8367687c27172a10ba5cc849bd82077bd7d
c2faf8e8c6c985e70a6c3174e9f1b53d440a8b51:
title: 'wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower'
mainline: 393b6bc174b0dd21bb2a36c13b36e62fc3474a23
upstream: b0b862aa3dbcd16b3c4715259a825f48ca540088
c7df04a616677a7c4473babee0b81900a2728200:
title: 'wifi: iwlegacy: Clear stale interrupts before resuming device'
mainline: 07c90acb071b9954e1fecb1e4f4f13d12c544b34
upstream: 271d282ecc15d7012e71ca82c89a6c0e13a063dd
452c0cdb1398e3788d1af22b061acaebfa8a3915:
title: 'nilfs2: fix potential deadlock with newly created symlinks'
mainline: b3a033e3ecd3471248d474ef263aadc0059e516a
upstream: cc38c596e648575ce58bfc31623a6506eda4b94a
f38c624794c3ea409b8ee122b2a9a9f7df076a25:
title: 'ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow'
mainline: bc0a2f3a73fcdac651fca64df39306d1e5ebe3b0
upstream: 27d95867bee806cdc448d122bd99f1d8b0544035
53f13ddee939d270ae9524040c1d9b45321fb656:
title: 'nilfs2: fix kernel bug due to missing clearing of checked flag'
mainline: 41e192ad2779cae0102879612dfe46726e4396aa
upstream: 994b2fa13a6c9cf3feca93090a9c337d48e3d60d

1384
.elts/upstream/4.19.323.yaml Normal file

File diff suppressed because it is too large Load Diff

1
.gitignore vendored
View File

@@ -104,7 +104,6 @@ GTAGS
# id-utils files
ID
*.orig
*~
\#*#

View File

@@ -516,7 +516,7 @@ at module load time (for a module) with::
[dbg_probe=1]
The addresses are normal I2C addresses. The adapter is the string
name of the adapter, as shown in /sys/class/i2c-adapter/i2c-<n>/name.
name of the adapter, as shown in /sys/bus/i2c/devices/i2c-<n>/name.
It is *NOT* i2c-<n> itself. Also, the comparison is done ignoring
spaces, so if the name is "This is an I2C chip" you can say
adapter_name=ThisisanI2cchip. This is because it's hard to pass in

View File

@@ -45,14 +45,24 @@ how the user addresses are used by the kernel:
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
tagged pointers in this context is allowed with the exception of
``brk()``, ``mmap()`` and the ``new_address`` argument to
``mremap()`` as these have the potential to alias with existing
user addresses.
tagged pointers in this context is allowed with these exceptions:
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
incorrectly accept valid tagged pointers for the ``brk()``,
``mmap()`` and ``mremap()`` system calls.
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
``mremap()`` as these have the potential to alias with existing
user addresses.
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
incorrectly accept valid tagged pointers for the ``brk()``,
``mmap()`` and ``mremap()`` system calls.
- The ``range.start``, ``start`` and ``dst`` arguments to the
``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from
``userfaultfd()``, as fault addresses subsequently obtained by reading
the file descriptor will be untagged, which may otherwise confuse
tag-unaware programs.
NOTE: This behaviour changed in v5.14 and so some earlier kernels may
incorrectly accept valid tagged pointers for this system call.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to

View File

@@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
- stat_interval
- stat_refresh
- swappiness
- unprivileged_userfaultfd
- user_reserve_kbytes
- vfs_cache_pressure
- watermark_scale_factor
@@ -840,6 +841,22 @@ The default value is 60.
==============================================================
unprivileged_userfaultfd
This flag controls the mode in which unprivileged users can use the
userfaultfd system calls. Set this to 0 to restrict unprivileged users
to handle page faults in user mode only. In this case, users without
SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
succeed. Prohibiting use of userfaultfd for handling faults from kernel
mode may make certain vulnerabilities more difficult to exploit.
Set this to 1 to allow unprivileged users to use the userfaultfd system
calls without any restrictions.
The default value is 0.
==============================================================
- user_reserve_kbytes
When overcommit_memory is set to 2, "never overcommit" mode, reserve

View File

@@ -56,36 +56,37 @@ the generic ioctl available.
The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
defines what memory types are supported by the userfaultfd and what
events, except page fault notifications, may be generated.
events, except page fault notifications, may be generated:
If the kernel supports registering userfaultfd ranges on hugetlbfs
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
set if the kernel supports registering userfaultfd ranges on shared
memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
MAP_SHARED, memfd_create, etc).
- The UFFD_FEATURE_EVENT_* flags indicate that various other events
other than page faults are supported. These events are described in more
detail below in the Non-cooperative userfaultfd section.
The userland application that wants to use userfaultfd with hugetlbfs
or shared memory need to set the corresponding flag in
uffdio_api.features to enable those features.
- UFFD_FEATURE_MISSING_HUGETLBFS and UFFD_FEATURE_MISSING_SHMEM
indicate that the kernel supports UFFDIO_REGISTER_MODE_MISSING
registrations for hugetlbfs and shared memory (covering all shmem APIs,
i.e. tmpfs, IPCSHM, /dev/zero, MAP_SHARED, memfd_create,
etc) virtual memory areas, respectively.
If the userland desires to receive notifications for events other than
page faults, it has to verify that uffdio_api.features has appropriate
UFFD_FEATURE_EVENT_* bits set. These events are described in more
detail below in "Non-cooperative userfaultfd" section.
- UFFD_FEATURE_MINOR_HUGETLBFS indicates that the kernel supports
UFFDIO_REGISTER_MODE_MINOR registration for hugetlbfs virtual memory
areas. UFFD_FEATURE_MINOR_SHMEM is the analogous feature indicating
support for shmem virtual memory areas.
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
be invoked (if present in the returned uffdio_api.ioctls bitmask) to
register a memory range in the userfaultfd by setting the
The userland application should set the feature flags it intends to use
when invoking the UFFDIO_API ioctl, to request that those features be
enabled if supported.
Once the userfaultfd API has been enabled the UFFDIO_REGISTER
ioctl should be invoked (if present in the returned uffdio_api.ioctls
bitmask) to register a memory range in the userfaultfd by setting the
uffdio_register structure accordingly. The uffdio_register.mode
bitmask will specify to the kernel which kind of faults to track for
the range (UFFDIO_REGISTER_MODE_MISSING would track missing
pages). The UFFDIO_REGISTER ioctl will return the
the range. The UFFDIO_REGISTER ioctl will return the
uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
supported for all memory types depending on the underlying virtual
memory backend (anonymous memory vs tmpfs vs real filebacked
mappings).
supported for all memory types (e.g. anonymous memory vs. shmem vs.
hugetlbfs), or all types of intercepted faults.
Userland can use the uffdio_register.ioctls to manage the virtual
address space in the background (to add or potentially also remove
@@ -93,13 +94,60 @@ memory from the userfaultfd registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.
The primary ioctl to resolve userfaults is UFFDIO_COPY. That
atomically copies a page into the userfault registered range and wakes
up the blocked userfaults (unless uffdio_copy.mode &
UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
half copied page since it'll keep userfaulting until the copy has
finished.
Resolving Userfaults
--------------------
There are three basic ways to resolve userfaults:
- UFFDIO_COPY atomically copies some existing page contents from
userspace.
- UFFDIO_ZEROPAGE atomically zeros the new page.
- UFFDIO_CONTINUE maps an existing, previously-populated page.
These operations are atomic in the sense that they guarantee nothing can
see a half-populated page, since readers will keep userfaulting until the
operation has finished.
By default, these wake up userfaults blocked on the range in question.
They support a UFFDIO_*_MODE_DONTWAKE mode flag, which indicates
that waking will be done separately at some later time.
Which ioctl to choose depends on the kind of page fault, and what we'd
like to do to resolve it:
- For UFFDIO_REGISTER_MODE_MISSING faults, the fault needs to be
resolved by either providing a new page (UFFDIO_COPY), or mapping
the zero page (UFFDIO_ZEROPAGE). By default, the kernel would map
the zero page for a missing fault. With userfaultfd, userspace can
decide what content to provide before the faulting thread continues.
- For UFFDIO_REGISTER_MODE_MINOR faults, there is an existing page (in
the page cache). Userspace has the option of modifying the page's
contents before resolving the fault. Once the contents are correct
(modified or not), userspace asks the kernel to map the page and let the
faulting thread continue with UFFDIO_CONTINUE.
Notes:
- You can tell which kind of fault occurred by examining
pagefault.flags within the uffd_msg, checking for the
UFFD_PAGEFAULT_FLAG_* flags.
- None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate
ioctl struct including the range.
- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
uffd. You can supply as many pages as you want with these IOCTLs.
Keep in mind that unless you used DONTWAKE then the first of any of
those IOCTLs wakes up the faulting thread.
- Be sure to test for all errors including
(pollfd[0].revents & POLLERR). This can happen, e.g. when ranges
supplied were incorrect.
== QEMU/KVM ==

View File

@@ -9074,6 +9074,19 @@ S: Maintained
F: arch/arm/boot/dts/mmp*
F: arch/arm/mach-mmp/
MMU GATHER AND TLB INVALIDATION
M: Will Deacon <will.deacon@arm.com>
M: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
M: Andrew Morton <akpm@linux-foundation.org>
M: Nick Piggin <npiggin@gmail.com>
M: Peter Zijlstra <peterz@infradead.org>
L: linux-arch@vger.kernel.org
L: linux-mm@kvack.org
S: Maintained
F: arch/*/include/asm/tlb.h
F: include/asm-generic/tlb.h
F: mm/mmu_gather.c
MN88472 MEDIA DRIVER
M: Antti Palosaari <crope@iki.fi>
L: linux-media@vger.kernel.org

View File

@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 4
PATCHLEVEL = 14
SUBLEVEL = 212
EXTRAVERSION =
SUBLEVEL = 356
EXTRAVERSION =
NAME = Petit Gorille
# *DOCUMENTATION*

View File

@@ -773,6 +773,18 @@ config HAVE_IRQ_TIME_ACCOUNTING
Archs need to ensure they use a high enough resolution clock to
support irq time accounting and then call enable_sched_clock_irqtime().
config HAVE_MOVE_PUD
bool
help
Architectures that select this are able to move page tables at the
PUD level. If there are only 3 page table levels, the move effectively
happens at the PGD level.
config HAVE_MOVE_PMD
bool
help
Archs that select this are able to move page tables at the PMD level.
config HAVE_ARCH_TRANSPARENT_HUGEPAGE
bool

View File

@@ -89,7 +89,7 @@ do_page_fault(unsigned long address, unsigned long mmcsr,
const struct exception_table_entry *fixup;
int fault, si_code = SEGV_MAPERR;
siginfo_t info;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
/* As of EV6, a load into $31/$f31 is a prefetch, and never faults
(or is suppressed by the PALcode). Support that for older CPUs
@@ -150,7 +150,7 @@ retry:
the fault. */
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -169,7 +169,7 @@ retry:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would
* have already released it in __lock_page_or_retry

View File

@@ -66,29 +66,24 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
siginfo_t info;
int fault, ret;
int write = regs->ecr_cause & ECR_C_PROTV_STORE; /* ST/EX */
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
int sig, si_code = SEGV_MAPERR;
unsigned int write = 0, exec = 0, mask;
int fault = VM_FAULT_SIGSEGV; /* handle_mm_fault() output */
unsigned int flags; /* handle_mm_fault() input */
/*
* We fault-in kernel-space virtual memory on-demand. The
* 'reference' page table is init_mm.pgd.
*
* NOTE! We MUST NOT take any locks for this case. We may
* be in an interrupt or a critical region, and should
* only copy the information from the master page table,
* nothing more.
*/
if (address >= VMALLOC_START) {
ret = handle_kernel_vaddr_fault(address);
if (unlikely(ret))
goto bad_area_nosemaphore;
if (address >= VMALLOC_START && !user_mode(regs)) {
if (unlikely(handle_kernel_vaddr_fault(address)))
goto no_context;
else
return;
}
info.si_code = SEGV_MAPERR;
/*
* If we're in an interrupt or have no user
* context, we must not take the fault..
@@ -96,147 +91,112 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
if (faulthandler_disabled() || !mm)
goto no_context;
if (regs->ecr_cause & ECR_C_PROTV_STORE) /* ST/EX */
write = 1;
else if ((regs->ecr_vec == ECR_V_PROTV) &&
(regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
exec = 1;
flags = FAULT_FLAG_DEFAULT;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
if (write)
flags |= FAULT_FLAG_WRITE;
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
if (!vma)
goto bad_area;
if (vma->vm_start <= address)
goto good_area;
if (!(vma->vm_flags & VM_GROWSDOWN))
goto bad_area;
if (expand_stack(vma, address))
goto bad_area;
/*
* Ok, we have a good vm_area for this memory access, so
* we can handle it..
*/
good_area:
info.si_code = SEGV_ACCERR;
/* Handle protection violation, execute on heap or stack */
if ((regs->ecr_vec == ECR_V_PROTV) &&
(regs->ecr_cause == ECR_C_PROTV_INST_FETCH))
goto bad_area;
if (write) {
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
flags |= FAULT_FLAG_WRITE;
} else {
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
if (unlikely(address < vma->vm_start)) {
if (!(vma->vm_flags & VM_GROWSDOWN) || expand_stack(vma, address))
goto bad_area;
}
/*
* If for any reason at all we couldn't handle the fault,
* make sure we exit gracefully rather than endlessly redo
* the fault.
* vm_area is good, now check permissions for this memory access
*/
mask = VM_READ;
if (write)
mask = VM_WRITE;
if (exec)
mask = VM_EXEC;
if (!(vma->vm_flags & mask)) {
info.si_code = SEGV_ACCERR;
goto bad_area;
}
fault = handle_mm_fault(vma, address, flags);
/* If Pagefault was interrupted by SIGKILL, exit page fault "early" */
if (unlikely(fatal_signal_pending(current))) {
if ((fault & VM_FAULT_ERROR) && !(fault & VM_FAULT_RETRY))
up_read(&mm->mmap_sem);
if (user_mode(regs))
return;
}
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
if (likely(!(fault & VM_FAULT_ERROR))) {
if (flags & FAULT_FLAG_ALLOW_RETRY) {
/* To avoid updating stats twice for retry case */
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
regs, address);
} else {
tsk->min_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
regs, address);
}
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
}
/* Fault Handled Gracefully */
up_read(&mm->mmap_sem);
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
goto no_context;
return;
}
if (fault & VM_FAULT_OOM)
goto out_of_memory;
else if (fault & VM_FAULT_SIGSEGV)
goto bad_area;
else if (fault & VM_FAULT_SIGBUS)
goto do_sigbus;
/* no man's land */
BUG();
/*
* Something tried to access memory that isn't in our memory map..
* Fix it, but check if it's kernel or user first..
* Fault retry nuances, mmap_sem already relinquished by core mm
*/
if (unlikely((fault & VM_FAULT_RETRY) &&
(flags & FAULT_FLAG_ALLOW_RETRY))) {
flags |= FAULT_FLAG_TRIED;
goto retry;
}
bad_area:
up_read(&mm->mmap_sem);
bad_area_nosemaphore:
/* User mode accesses just cause a SIGSEGV */
if (user_mode(regs)) {
tsk->thread.fault_address = address;
info.si_signo = SIGSEGV;
info.si_errno = 0;
/* info.si_code has been set above */
info.si_addr = (void __user *)address;
force_sig_info(SIGSEGV, &info, tsk);
return;
}
no_context:
/* Are we prepared to handle this kernel fault?
*
* (The kernel has valid exception-points in the source
* when it accesses user-memory. When it fails in one
* of those points, we find it in a table and do a jump
* to some fixup code that loads an appropriate error
* code)
/*
* Major/minor page fault accounting
* (in case of retry we only land here once)
*/
if (fixup_exception(regs))
return;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
die("Oops", regs, address);
if (likely(!(fault & VM_FAULT_ERROR))) {
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
regs, address);
} else {
tsk->min_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
regs, address);
}
out_of_memory:
up_read(&mm->mmap_sem);
if (user_mode(regs)) {
pagefault_out_of_memory();
/* Normal return path: fault Handled Gracefully */
return;
}
goto no_context;
do_sigbus:
up_read(&mm->mmap_sem);
if (!user_mode(regs))
goto no_context;
if (fault & VM_FAULT_OOM) {
pagefault_out_of_memory();
return;
}
if (fault & VM_FAULT_SIGBUS) {
sig = SIGBUS;
si_code = BUS_ADRERR;
}
else {
sig = SIGSEGV;
}
tsk->thread.fault_address = address;
info.si_signo = SIGBUS;
info.si_signo = sig;
info.si_errno = 0;
info.si_code = BUS_ADRERR;
info.si_code = si_code;
info.si_addr = (void __user *)address;
force_sig_info(SIGBUS, &info, tsk);
force_sig_info(sig, &info, tsk);
return;
no_context:
if (fixup_exception(regs))
return;
die("Oops", regs, address);
}

View File

@@ -70,6 +70,7 @@ static void __init realview_smp_prepare_cpus(unsigned int max_cpus)
return;
}
map = syscon_node_to_regmap(np);
of_node_put(np);
if (IS_ERR(map)) {
pr_err("PLATSMP: No syscon regmap\n");
return;

View File

@@ -264,7 +264,7 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
struct task_struct *tsk;
struct mm_struct *mm;
int fault, sig, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
if (notify_page_fault(regs, fsr))
return 0;
@@ -318,7 +318,7 @@ retry:
* signal first. We do not need to release the mmap_sem because
* it would already be released in __lock_page_or_retry in
* mm/filemap.c. */
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
goto no_context;
return 0;
@@ -342,9 +342,6 @@ retry:
regs, addr);
}
if (fault & VM_FAULT_RETRY) {
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}

View File

@@ -83,6 +83,8 @@ config ARM64
select GENERIC_TIME_VSYSCALL
select HANDLE_DOMAIN_IRQ
select HARDIRQS_SW_RESEND
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_ACPI_APEI if (ACPI && EFI)
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
@@ -128,6 +130,7 @@ config ARM64
select HAVE_PERF_USER_STACK_DUMP
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_RCU_TABLE_FREE
select HAVE_RCU_TABLE_INVALIDATE
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
select HAVE_KRETPROBES
@@ -145,6 +148,7 @@ config ARM64
select SPARSE_IRQ
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
select ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
help
ARM 64-bit (AArch64) Linux support.
@@ -234,7 +238,10 @@ config GENERIC_CALIBRATE_DELAY
def_bool y
config ZONE_DMA
def_bool y
bool "Enable or Disable Zone DMA"
default y
help
This option enables/disables the DMA zone.
config HAVE_GENERIC_GUP
def_bool y

View File

@@ -790,6 +790,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU0>;
status = "disabled";
};
jtag_mm1: jtagmm@7140000 {
@@ -801,6 +802,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU1>;
status = "disabled";
};
jtag_mm2: jtagmm@7240000 {
@@ -812,6 +814,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU2>;
status = "disabled";
};
jtag_mm3: jtagmm@7340000 {
@@ -823,6 +826,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU3>;
status = "disabled";
};
jtag_mm4: jtagmm@7440000 {
@@ -834,6 +838,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU4>;
status = "disabled";
};
jtag_mm5: jtagmm@7540000 {
@@ -845,6 +850,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU5>;
status = "disabled";
};
jtag_mm6: jtagmm@7640000 {
@@ -856,6 +862,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU6>;
status = "disabled";
};
jtag_mm7: jtagmm@7740000 {
@@ -867,6 +874,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU7>;
status = "disabled";
};
qcom,msm-imem@146aa000 {
@@ -3887,13 +3895,13 @@
qcom,initial-pwrlevel = <5>;
qcom,ca-target-pwrlevel = <4>;
/* NOM_L1 */
/* TURBO */
qcom,gpu-pwrlevel@0 {
reg = <0>;
qcom,gpu-freq = <750000000>;
qcom,bus-freq = <11>;
qcom,bus-freq = <12>;
qcom,bus-min = <10>;
qcom,bus-max = <11>;
qcom,bus-max = <12>;
};
/* NOM_L1 */
@@ -3902,7 +3910,7 @@
qcom,gpu-freq = <650000000>;
qcom,bus-freq = <10>;
qcom,bus-min = <8>;
qcom,bus-max = <11>;
qcom,bus-max = <12>;
};
/* NOM */
@@ -4034,7 +4042,6 @@
#include "pm6150l.dtsi"
#include "atoll-pinctrl.dtsi"
#include "atoll-pm.dtsi"
#include "atoll-coresight.dtsi"
#include "atoll-regulator.dtsi"
#include "atoll-usb.dtsi"
#include "atoll-vidc.dtsi"

View File

@@ -528,6 +528,7 @@
"all-ramp-down-done-irq",
"all-ramp-up-done-irq";
qcom,short-circuit-det;
qcom,torch-realtime-brightness-control;
qcom,open-circuit-det;
qcom,vph-droop-det;
qcom,thermal-derate-en;

View File

@@ -680,6 +680,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU0>;
status = "disabled";
};
jtag_mm1: jtagmm@7140000 {
@@ -691,6 +692,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU1>;
status = "disabled";
};
jtag_mm2: jtagmm@7240000 {
@@ -702,6 +704,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU2>;
status = "disabled";
};
jtag_mm3: jtagmm@7340000 {
@@ -713,6 +716,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU3>;
status = "disabled";
};
jtag_mm4: jtagmm@7440000 {
@@ -724,6 +728,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU4>;
status = "disabled";
};
jtag_mm5: jtagmm@7540000 {
@@ -735,6 +740,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU5>;
status = "disabled";
};
jtag_mm6: jtagmm@7640000 {
@@ -746,6 +752,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU6>;
status = "disabled";
};
jtag_mm7: jtagmm@7740000 {
@@ -757,6 +764,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU7>;
status = "disabled";
};
intc: interrupt-controller@17a00000 {
@@ -3321,7 +3329,6 @@
#include "pm8009.dtsi"
#include "sdmmagpie-regulator.dtsi"
#include "sdmmagpie-camera.dtsi"
#include "sdmmagpie-coresight.dtsi"
#include "sdmmagpie-usb.dtsi"
#include "sdmmagpie-thermal.dtsi"

View File

@@ -3246,7 +3246,6 @@
#include "sm6150-camera.dtsi"
#include "sm6150-ion.dtsi"
#include "msm-arm-smmu-sm6150.dtsi"
#include "sm6150-coresight.dtsi"
#include "sm6150-bus.dtsi"
#include "sm6150-vidc.dtsi"
#include "sm6150-audio.dtsi"

View File

@@ -151,36 +151,6 @@
cache-slice-names = "gpu", "gpuhtw";
cache-slices = <&llcc 12>, <&llcc 11>;
qcom,gpu-coresights {
#address-cells = <1>;
#size-cells = <0>;
compatible = "qcom,gpu-coresight";
qcom,gpu-coresight@0 {
reg = <0>;
coresight-name = "coresight-gfx";
coresight-atid = <50>;
port {
gfx_out_funnel_gfx: endpoint {
remote-endpoint =
<&funnel_gfx_in_gfx>;
};
};
};
qcom,gpu-coresight@1 {
reg = <1>;
coresight-name = "coresight-gfx-cx";
coresight-atid = <51>;
port {
gfx_cx_out_funnel_gfx: endpoint {
remote-endpoint =
<&funnel_gfx_in_gfx_cx>;
};
};
};
};
qcom,l3-pwrlevels {
#address-cells = <1>;
#size-cells = <0>;

View File

@@ -48,10 +48,6 @@
};
};
&tmc_etr {
/delete-property/ qcom,smmu-s1-bypass;
};
&mdss_dsi0_pll {
compatible = "qcom,mdss_dsi_pll_7nm_v2";
};

View File

@@ -824,6 +824,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU0>;
status = "disabled";
};
jtag_mm1: jtagmm@7140000 {
@@ -835,6 +836,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU1>;
status = "disabled";
};
jtag_mm2: jtagmm@7240000 {
@@ -846,6 +848,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU2>;
status = "disabled";
};
jtag_mm3: jtagmm@7340000 {
@@ -857,6 +860,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU3>;
status = "disabled";
};
jtag_mm4: jtagmm@7440000 {
@@ -868,6 +872,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU4>;
status = "disabled";
};
jtag_mm5: jtagmm@7540000 {
@@ -879,6 +884,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU5>;
status = "disabled";
};
jtag_mm6: jtagmm@7640000 {
@@ -890,6 +896,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU6>;
status = "disabled";
};
jtag_mm7: jtagmm@7740000 {
@@ -901,6 +908,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU7>;
status = "disabled";
};
intc: interrupt-controller@17a00000 {
@@ -4175,7 +4183,6 @@
#include "sm8150-bus.dtsi"
#include "sm8150-pcie.dtsi"
#include "sm8150-smp2p.dtsi"
#include "sm8150-coresight.dtsi"
#include "msm-arm-smmu-sm8150.dtsi"
#include "sm8150-qupv3.dtsi"
#include "sm8150-npu.dtsi"

View File

@@ -601,6 +601,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU0>;
status = "disabled";
};
jtag_mm1: jtagmm@9140000 {
@@ -612,6 +613,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU1>;
status = "disabled";
};
jtag_mm2: jtagmm@9240000 {
@@ -623,6 +625,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU2>;
status = "disabled";
};
jtag_mm3: jtagmm@9340000 {
@@ -634,6 +637,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU3>;
status = "disabled";
};
jtag_mm4: jtagmm@9440000 {
@@ -645,6 +649,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU4>;
status = "disabled";
};
jtag_mm5: jtagmm@9540000 {
@@ -656,6 +661,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU5>;
status = "disabled";
};
jtag_mm6: jtagmm@9640000 {
@@ -667,6 +673,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU6>;
status = "disabled";
};
jtag_mm7: jtagmm@9740000 {
@@ -678,6 +685,7 @@
clock-names = "core_clk";
qcom,coresight-jtagmm-cpu = <&CPU7>;
status = "disabled";
};
wakegic: wake-gic {
@@ -2720,7 +2728,6 @@
#include "trinket-camera.dtsi"
#include "msm-arm-smmu-trinket.dtsi"
#include "trinket-qupv3.dtsi"
#include "trinket-coresight.dtsi"
#include "trinket-vidc.dtsi"
#include "trinket-pm.dtsi"
#include "trinket-gpu.dtsi"

View File

@@ -185,6 +185,22 @@
status = "okay";
};
&gpio3 {
/*
* The Qseven BIOS_DISABLE signal on the RK3399-Q7 keeps the on-module
* eMMC and SPI flash powered-down initially (in fact it keeps the
* reset signal asserted). BIOS_DISABLE_OVERRIDE pin allows to override
* that signal so that eMMC and SPI can be used regardless of the state
* of the signal.
*/
bios-disable-override-hog {
gpios = <RK_PD5 GPIO_ACTIVE_LOW>;
gpio-hog;
line-name = "bios_disable_override";
output-high;
};
};
&gmac {
assigned-clocks = <&cru SCLK_RMII_SRC>;
assigned-clock-parents = <&clkin_gmac>;
@@ -453,6 +469,21 @@
};
&pinctrl {
pinctrl-names = "default";
pinctrl-0 = <&q7_thermal_pin &bios_disable_override_hog_pin>;
gpios {
bios_disable_override_hog_pin: bios-disable-override-hog-pin {
rockchip,pins =
<3 RK_PD5 RK_FUNC_GPIO &pcfg_pull_down>;
};
q7_thermal_pin: q7-thermal-pin {
rockchip,pins =
<0 RK_PA3 RK_FUNC_GPIO &pcfg_pull_up>;
};
};
i2c8 {
i2c8_xfer_a: i2c8-xfer {
rockchip,pins =

View File

@@ -1,10 +1,11 @@
# CONFIG_ZONE_DMA is not set
CONFIG_HOTPLUG_SIZE_BITS=29
CONFIG_LOCALVERSION="-Rhinestone"
CONFIG_LOCALVERSION_AUTO=y
CONFIG_LOCALVERSION="-ShadowBladeX-TheNoah77-"
CONFIG_LOCALVERSION_AUTO=n
# CONFIG_FHANDLE is not set
CONFIG_AUDIT=y
# CONFIG_AUDITSYSCALL is not set
CONFIG_DEFAULT_HOSTNAME="debdeep199x"
CONFIG_DEFAULT_HOSTNAME="Shahid Shamim"
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_IRQ_TIME_ACCOUNTING=y
@@ -20,18 +21,25 @@ CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_IKHEADERS=y
CONFIG_LOG_CPU_MAX_BUF_SHIFT=17
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
# CONFIG_FAIR_GROUP_SCHED is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_BPF=y
# CONFIG_DEBUG_FS is not set
# CONFIG_SCHED_CORE_CTL is not set
CONFIG_NAMESPACES=y
# CONFIG_PID_NS is not set
CONFIG_SCHED_AUTOGROUP=y
CONFIG_SCHED_TUNE=y
CONFIG_CPUSET_ASSIST=y
CONFIG_CPUSET_BG="4-5"
CONFIG_CPUSET_CAMERA="0-3"
CONFIG_CPUSET_FG="0-5,7"
CONFIG_CPUSET_RESTRICTED="2-5"
CONFIG_CPUSET_SYSTEM_BG="2-5"
CONFIG_DEFAULT_USE_ENERGY_AWARE=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_RD_XZ is not set
@@ -40,6 +48,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y
@@ -56,44 +65,67 @@ CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_GKI_HACKS_TO_FIX=y
CONFIG_DEFAULT_MQ_DEADLINE=y
CONFIG_MQ_IOSCHED_DEADLINE=y
CONFIG_ARCH_QCOM=y
CONFIG_ARCH_ATOLL=y
CONFIG_ARCH_SDMMAGPIE=y
CONFIG_PCI=y
CONFIG_PCI_MSM=y
# CONFIG_ARM64_ERRATUM_826319 is not set
# CONFIG_ARM64_ERRATUM_827319 is not set
# CONFIG_ARM64_ERRATUM_824069 is not set
# CONFIG_ARM64_ERRATUM_819472 is not set
# CONFIG_ARM64_ERRATUM_832075 is not set
# CONFIG_ARM64_ERRATUM_845719 is not set
# CONFIG_ARM64_ERRATUM_1024718 is not set
# CONFIG_ARM64_ERRATUM_1188873 is not set
# CONFIG_CAVIUM_ERRATUM_22375 is not set
# CONFIG_CAVIUM_ERRATUM_23154 is not set
# CONFIG_CAVIUM_ERRATUM_27456 is not set
# CONFIG_CAVIUM_ERRATUM_30115 is not set
# CONFIG_QCOM_FALKOR_ERRATUM_1003 is not set
# CONFIG_QCOM_FALKOR_ERRATUM_1009 is not set
# CONFIG_QCOM_QDF2400_ERRATUM_0065 is not set
# CONFIG_QCOM_FALKOR_ERRATUM_E1041 is not set
CONFIG_SCHED_MC=y
CONFIG_NR_CPUS=8
CONFIG_PREEMPT=y
#ifdef OPLUS_FEATURE_HEALTHINFO
#wenbin.liu@PSW.BSP.MM, 2018/06/14 Delete for improve performance
# CONFIG_HZ_100=y
# CONFIG_HZ_300 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ_250=y
CONFIG_HZ=250
# CONFIG_HZ_250 is not set
# CONFIG_HZ=250
#endif /* OPLUS_FEATURE_HEALTHINFO */
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_MEMORY_HOTPLUG_MOVABLE_NODE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
CONFIG_ZSMALLOC=y
CONFIG_BALANCE_ANON_FILE_RECLAIM=y
CONFIG_HAVE_LOW_MEMORY_KILLER=n
CONFIG_SECCOMP=y
CONFIG_ARM64_SSBD=y
# CONFIG_HARDEN_BRANCH_PREDICTOR is not set
CONFIG_ARMV8_DEPRECATED=y
CONFIG_SWP_EMULATION=y
CONFIG_CP15_BARRIER_EMULATION=y
CONFIG_SETEND_EMULATION=y
CONFIG_ARM64_SW_TTBR0_PAN=y
CONFIG_ARM64_LSE_ATOMICS=y
# CONFIG_PROCESS_RECLAIM is not set
# CONFIG_ARM64_VHE is not set
CONFIG_RANDOMIZE_BASE=y
# CONFIG_EFI is not set
CONFIG_KRYO_PMU_WORKAROUND=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_VDSO=y
CONFIG_WAKELOCK=y
CONFIG_PM_RUNTIME=y
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=0
# CONFIG_PM_WAKELOCKS_GC is not set
@@ -110,11 +142,13 @@ CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPU_BOOST=y
CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
CONFIG_SCHEDUTIL_UP_RATE_LIMIT=500
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_INTERFACE=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
CONFIG_INET=y
@@ -277,8 +311,9 @@ CONFIG_REGMAP_WCD_IRQ=y
CONFIG_REGMAP_ALLOW_WRITE_DEBUGFS=y
CONFIG_DMA_CMA=y
CONFIG_ZRAM=y
CONFIG_ZRAM_DEDUP=y
CONFIG_ZRAM_WRITEBACK=y
# CONFIG_ZRAM_DEDUP is not set
# CONFIG_ZRAM_WRITEBACK is not set
CONFIG_ZRAM_DEFAULT_COMP_ALGORITHM="lz4"
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
CONFIG_BLK_DEV_RAM=y
@@ -286,7 +321,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_HDCP_QSEECOM=y
CONFIG_QSEECOM=y
CONFIG_UID_SYS_STATS=y
CONFIG_MEMORY_STATE_TIME=y
CONFIG_QPNP_MISC=y
# CONFIG_FPR_FPC=y
CONFIG_SCSI=y
@@ -434,6 +468,11 @@ CONFIG_PINCTRL_SLPI=y
CONFIG_GPIO_SYSFS=y
CONFIG_POWER_RESET_QCOM=y
CONFIG_QCOM_DLOAD_MODE=y
CONFIG_QCOM_FASTRPC=y
CONFIG_QCOM_ADSP_PIL=y
CONFIG_QCOM_Q6V5_ADSP=y
CONFIG_QCOM_Q6V5_MSS=y
CONFIG_QCOM_MSM_SMD=y
CONFIG_POWER_RESET_XGENE=y
CONFIG_POWER_RESET_SYSCON=y
CONFIG_QPNP_QG=y
@@ -519,9 +558,6 @@ CONFIG_MSM_VIDC_GOVERNORS=y
CONFIG_MSM_SDE_ROTATOR=y
CONFIG_MSM_SDE_ROTATOR_EVTLOG_DEBUG=y
CONFIG_MSM_NPU_V2=y
CONFIG_DVB_MPQ=m
CONFIG_DVB_MPQ_DEMUX=m
CONFIG_DVB_MPQ_SW=y
CONFIG_DRM=y
CONFIG_DRM_MSM_REGISTER_LOGGING=y
CONFIG_DRM_SDE_EVTLOG_DEBUG=y
@@ -591,7 +627,6 @@ CONFIG_MMC=y
CONFIG_MMC_PERF_PROFILING=y
CONFIG_MMC_BLOCK_MINORS=32
CONFIG_MMC_BLOCK_DEFERRED_RESUME=y
CONFIG_MMC_TEST=m
CONFIG_MMC_PARANOID_SD_INIT=y
CONFIG_MMC_CLKGATE=y
CONFIG_MMC_SDHCI=y
@@ -609,7 +644,6 @@ CONFIG_LEDS_QPNP_VIBRATOR_LDO=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_EDAC=y
CONFIG_EDAC_KRYO_ARM64=y
CONFIG_EDAC_KRYO_ARM64_PANIC_ON_UE=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_QPNP=y
CONFIG_DMADEVICES=y
@@ -618,7 +652,7 @@ CONFIG_UIO=y
CONFIG_UIO_MSM_SHAREDMEM=y
CONFIG_STAGING=y
CONFIG_ASHMEM=y
CONFIG_ANDROID_LOW_MEMORY_KILLER=y
CONFIG_ANDROID_LOW_MEMORY_KILLER=n
CONFIG_ION=y
CONFIG_ION_DEFER_FREE_NO_SCHED_IDLE=y
CONFIG_QCOM_GENI_SE=y
@@ -666,6 +700,9 @@ CONFIG_SM_NPUCC_ATOLL=y
CONFIG_SM_DEBUGCC_ATOLL=y
CONFIG_HWSPINLOCK=y
CONFIG_HWSPINLOCK_QCOM=y
# CONFIG_FSL_ERRATUM_A008585 is not set
# CONFIG_HISILICON_ERRATUM_161010101 is not set
# CONFIG_ARM64_ERRATUM_858921 is not set
CONFIG_QCOM_APCS_IPC=y
CONFIG_MSM_QMP=y
CONFIG_IOMMU_IO_PGTABLE_FAST=y
@@ -727,7 +764,7 @@ CONFIG_MSM_PM=y
CONFIG_QCOM_FSA4480_I2C=y
CONFIG_MEM_SHARE_QMI_SERVICE=y
# CONFIG_MSM_PERFORMANCE is not set
CONFIG_QMP_DEBUGFS_CLIENT=y
# CONFIG_QMP_DEBUGFS_CLIENT is not set
CONFIG_QCOM_SMP2P_SLEEPSTATE=y
CONFIG_QCOM_CDSP_RM=y
CONFIG_QCOM_CX_IPEAK=y
@@ -760,12 +797,6 @@ CONFIG_ANDROID_BINDERFS=y
CONFIG_QCOM_QFPROM=y
CONFIG_NVMEM_SPMI_SDAM=y
CONFIG_SENSORS_SSC=y
CONFIG_ESOC=y
CONFIG_ESOC_DEV=y
CONFIG_ESOC_CLIENT=y
CONFIG_ESOC_MDM_4x=y
CONFIG_ESOC_MDM_DRV=y
CONFIG_ESOC_MDM_DBG_ENG=y
CONFIG_MSM_TZ_LOG=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
@@ -801,44 +832,38 @@ CONFIG_UNICODE=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_OPTIMIZE_INLINING=y
CONFIG_PAGE_OWNER=y
# CONFIG_PAGE_OWNER is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=5
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_LIST=y
CONFIG_IPC_LOGGING=y
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
CONFIG_CORESIGHT_STM=y
CONFIG_CORESIGHT_CTI=y
CONFIG_CORESIGHT_TPDA=y
CONFIG_CORESIGHT_TPDM=y
CONFIG_CORESIGHT_HWEVENT=y
CONFIG_CORESIGHT_DUMMY=y
CONFIG_CORESIGHT_REMOTE_ETM=y
CONFIG_CORESIGHT_REMOTE_ETM_DEFAULT_ENABLE=0
CONFIG_CORESIGHT_EVENT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SMACK=y
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DEV_QCOM_MSM_QCE=y
CONFIG_CRYPTO_DEV_QCRYPTO=y
CONFIG_CRYPTO_DEV_QCEDEV=y
CONFIG_CRYPTO_DEV_QCOM_ICE=y
CONFIG_CRYPTO_SHA512_ARM64=y
CONFIG_ARM64_CRYPTO=y
CONFIG_CRYPTO_SHA1_ARM64_CE=y
CONFIG_CRYPTO_SHA2_ARM64_CE=y
CONFIG_CRYPTO_GHASH_ARM64_CE=y
CONFIG_CRYPTO_CRC32_ARM64_CE=y
CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
CONFIG_CRYPTO_AES_ARM64_NEON_BLK=y
@@ -960,6 +985,7 @@ CONFIG_OPLUS_FEATURE_PMIC_MONITOR=y
#ifdef OPLUS_BUG_STABILITY
#zhangzongyu@BSP.Kernel.Stability, 2020/05/10, Add for dump device info
CONFIG_PSTORE=y
# CONFIG_PSTORE_ZLIB_COMPRESS is not set
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_RAM=y
@@ -1052,3 +1078,52 @@ CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_DEVFREQ_BOOST=y
CONFIG_DEVFREQ_INPUT_BOOST_DURATION_MS=58
CONFIG_DEVFREQ_CPU_LLCC_DDR_BW_BOOST_FREQ=3879
# Enable WPA3 personal
CONFIG_SAE=y
# Dynamic FSYNC
CONFIG_DYNAMIC_FSYNC=y
# Wireguard
CONFIG_WIREGUARD=y
# Sbalance
CONFIG_IRQ_SBALANCE=y
CONFIG_IRQ_SBALANCE_POLL_MSEC=3000
CONFIG_IRQ_SBALANCE_THRESH=1024
CONFIG_SBALANCE_EXCLUDE_CPUS="5,7"
# EROFS
CONFIG_EROFS_FS=y
# Fast Charge USB
CONFIG_FORCE_FAST_CHARGE=y
### TCP
CONFIG_TCP_CONG_ADVANCED=y
# CONFIG_TCP_CONG_BIC is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_TCP_CONG_WESTWOOD=y
# CONFIG_TCP_CONG_HTCP is not set
# CONFIG_TCP_CONG_HSTCP is not set
# CONFIG_TCP_CONG_HYBLA is not set
# CONFIG_TCP_CONG_VEGAS is not set
# CONFIG_TCP_CONG_NV is not set
# CONFIG_TCP_CONG_SCALABLE is not set
# CONFIG_TCP_CONG_LP is not set
# CONFIG_TCP_CONG_VENO is not set
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
# CONFIG_TCP_CONG_DCTCP is not set
# CONFIG_TCP_CONG_CDG is not set
CONFIG_TCP_CONG_BBR=y
# CONFIG_DEFAULT_CUBIC is not set
CONFIG_DEFAULT_WESTWOOD=y
# CONFIG_DEFAULT_BBR is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="westwood"
# CONFIG_TCP_MD5SIG is not set
# ANDROID_SIMPLE_LMK
CONFIG_ANDROID_SIMPLE_LMK=y

View File

@@ -39,6 +39,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
@@ -56,6 +57,7 @@ CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_GKI_HACKS_TO_FIX=y
CONFIG_ARCH_QCOM=y
CONFIG_ARCH_ATOLL=y
CONFIG_ARCH_SDMMAGPIE=y
@@ -73,8 +75,8 @@ CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_MEMORY_HOTPLUG_MOVABLE_NODE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_CMA=y
CONFIG_CMA_DEBUG=y
CONFIG_ZSMALLOC=y
CONFIG_BALANCE_ANON_FILE_RECLAIM=y
CONFIG_SECCOMP=y
@@ -109,6 +111,7 @@ CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_INTERFACE=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
CONFIG_INET=y
@@ -233,12 +236,14 @@ CONFIG_NET_SCHED=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS_FW=y
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_FLOW=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_MATCHALL=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_CMP=y
CONFIG_NET_EMATCH_NBYTE=y
@@ -246,9 +251,11 @@ CONFIG_NET_EMATCH_U32=y
CONFIG_NET_EMATCH_META=y
CONFIG_NET_EMATCH_TEXT=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=y
CONFIG_NET_ACT_MIRRED=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_ACT_BPF=y
CONFIG_QRTR=y
CONFIG_QRTR_SMD=y
CONFIG_BPF_JIT=y
@@ -268,7 +275,6 @@ CONFIG_NFC_PN553_DEVICES=y
#endif /*OPLUS_NFC_BRINGUP*/
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_REGMAP_WCD_IRQ=y
CONFIG_REGMAP_ALLOW_WRITE_DEBUGFS=y
CONFIG_DMA_CMA=y
CONFIG_ZRAM=y
CONFIG_ZRAM_DEDUP=y
@@ -279,7 +285,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_HDCP_QSEECOM=y
CONFIG_QSEECOM=y
CONFIG_UID_SYS_STATS=y
CONFIG_MEMORY_STATE_TIME=y
CONFIG_QPNP_MISC=y
CONFIG_FPR_FPC=y
CONFIG_SCSI=y
@@ -301,6 +306,7 @@ CONFIG_DM_SNAPSHOT=y
CONFIG_DM_UEVENT=y
CONFIG_DM_VERITY=y
CONFIG_DM_VERITY_FEC=y
CONFIG_DM_BOW=y
CONFIG_NETDEVICES=y
CONFIG_BONDING=y
CONFIG_DUMMY=y
@@ -540,6 +546,8 @@ CONFIG_HID_MICROSOFT=y
CONFIG_HID_MULTITOUCH=y
CONFIG_HID_NINTENDO=y
CONFIG_HID_PLANTRONICS=y
CONFIG_HID_PLAYSTATION=y
CONFIG_PLAYSTATION_FF=y
CONFIG_HID_SONY=y
CONFIG_HID_QVR=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
@@ -603,7 +611,6 @@ CONFIG_LEDS_QPNP_VIBRATOR_LDO=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_EDAC=y
CONFIG_EDAC_KRYO_ARM64=y
CONFIG_EDAC_KRYO_ARM64_PANIC_ON_UE=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_QPNP=y
CONFIG_DMADEVICES=y
@@ -624,7 +631,6 @@ CONFIG_IPA3=y
CONFIG_IPA_WDI_UNIFIED_API=y
CONFIG_RMNET_IPA3=y
CONFIG_RNDIS_IPA=y
CONFIG_IPA_UT=y
CONFIG_MSM_11AD=m
CONFIG_QCOM_MDSS_PLL=y
CONFIG_SPMI_PMIC_CLKDIV=y
@@ -713,7 +719,6 @@ CONFIG_QTI_RPMH_API=y
CONFIG_QSEE_IPC_IRQ=y
CONFIG_QCOM_GLINK=y
CONFIG_QCOM_GLINK_PKT=y
CONFIG_QTI_RPM_STATS_LOG=y
CONFIG_MSM_CDSP_LOADER=y
CONFIG_QCOM_SMCINVOKE=y
CONFIG_MSM_EVENT_TIMER=y
@@ -721,7 +726,6 @@ CONFIG_MSM_PM=y
CONFIG_QCOM_FSA4480_I2C=y
CONFIG_MEM_SHARE_QMI_SERVICE=y
CONFIG_MSM_PERFORMANCE=y
CONFIG_QMP_DEBUGFS_CLIENT=y
CONFIG_QCOM_SMP2P_SLEEPSTATE=y
CONFIG_QCOM_CDSP_RM=y
CONFIG_QCOM_CX_IPEAK=y
@@ -760,7 +764,6 @@ CONFIG_ESOC_CLIENT=y
CONFIG_ESOC_MDM_4x=y
CONFIG_ESOC_MDM_DRV=y
CONFIG_ESOC_MDM_DBG_ENG=y
CONFIG_MSM_TZ_LOG=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
@@ -793,37 +796,30 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_PAGE_OWNER=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=5
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_LIST=y
CONFIG_IPC_LOGGING=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
CONFIG_CORESIGHT_STM=y
CONFIG_CORESIGHT_CTI=y
CONFIG_CORESIGHT_TPDA=y
CONFIG_CORESIGHT_TPDM=y
CONFIG_CORESIGHT_HWEVENT=y
CONFIG_CORESIGHT_DUMMY=y
CONFIG_CORESIGHT_REMOTE_ETM=y
CONFIG_CORESIGHT_REMOTE_ETM_DEFAULT_ENABLE=0
CONFIG_CORESIGHT_EVENT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SMACK=y
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DEV_QCOM_MSM_QCE=y
CONFIG_CRYPTO_DEV_QCRYPTO=y
CONFIG_CRYPTO_DEV_QCEDEV=y

View File

@@ -41,6 +41,7 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y

View File

@@ -38,6 +38,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
@@ -55,6 +56,7 @@ CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_GKI_HACKS_TO_FIX=y
CONFIG_ARCH_QCOM=y
CONFIG_ARCH_SM6150=y
CONFIG_ARCH_SDMMAGPIE=y
@@ -64,8 +66,8 @@ CONFIG_SCHED_MC=y
CONFIG_NR_CPUS=8
CONFIG_PREEMPT=y
CONFIG_HZ_100=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_CMA=y
CONFIG_CMA_DEBUGFS=y
CONFIG_ZSMALLOC=y
CONFIG_BALANCE_ANON_FILE_RECLAIM=y
CONFIG_SECCOMP=y
@@ -100,6 +102,7 @@ CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_INTERFACE=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
CONFIG_INET=y
@@ -224,12 +227,14 @@ CONFIG_NET_SCHED=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS_FW=y
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_FLOW=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_MATCHALL=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_CMP=y
CONFIG_NET_EMATCH_NBYTE=y
@@ -237,9 +242,11 @@ CONFIG_NET_EMATCH_U32=y
CONFIG_NET_EMATCH_META=y
CONFIG_NET_EMATCH_TEXT=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=y
CONFIG_NET_ACT_MIRRED=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_ACT_BPF=y
CONFIG_QRTR=y
CONFIG_QRTR_SMD=y
CONFIG_BPF_JIT=y
@@ -254,7 +261,6 @@ CONFIG_RFKILL=y
CONFIG_NFC_NQ=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_REGMAP_WCD_IRQ=y
CONFIG_REGMAP_ALLOW_WRITE_DEBUGFS=y
CONFIG_DMA_CMA=y
CONFIG_ZRAM=y
CONFIG_BLK_DEV_LOOP=y
@@ -264,7 +270,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_HDCP_QSEECOM=y
CONFIG_QSEECOM=y
CONFIG_UID_SYS_STATS=y
CONFIG_MEMORY_STATE_TIME=y
CONFIG_QPNP_MISC=y
CONFIG_FPR_FPC=y
CONFIG_SCSI=y
@@ -471,6 +476,8 @@ CONFIG_HID_MICROSOFT=y
CONFIG_HID_MULTITOUCH=y
CONFIG_HID_NINTENDO=y
CONFIG_HID_PLANTRONICS=y
CONFIG_HID_PLAYSTATION=y
CONFIG_PLAYSTATION_FF=y
CONFIG_HID_SONY=y
CONFIG_HID_QVR=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
@@ -536,7 +543,6 @@ CONFIG_LEDS_QPNP_VIBRATOR_LDO=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_EDAC=y
CONFIG_EDAC_KRYO_ARM64=y
CONFIG_EDAC_KRYO_ARM64_PANIC_ON_UE=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_QPNP=y
CONFIG_DMADEVICES=y
@@ -557,7 +563,6 @@ CONFIG_IPA3=y
CONFIG_IPA_WDI_UNIFIED_API=y
CONFIG_RMNET_IPA3=y
CONFIG_RNDIS_IPA=y
CONFIG_IPA_UT=y
CONFIG_MSM_11AD=m
CONFIG_QCOM_MDSS_PLL=y
CONFIG_SPMI_PMIC_CLKDIV=y
@@ -639,7 +644,6 @@ CONFIG_QTI_RPMH_API=y
CONFIG_QSEE_IPC_IRQ=y
CONFIG_QCOM_GLINK=y
CONFIG_QCOM_GLINK_PKT=y
CONFIG_QTI_RPM_STATS_LOG=y
CONFIG_MSM_CDSP_LOADER=y
CONFIG_QCOM_SMCINVOKE=y
CONFIG_MSM_EVENT_TIMER=y
@@ -647,7 +651,6 @@ CONFIG_MSM_PM=y
CONFIG_QCOM_FSA4480_I2C=y
CONFIG_MEM_SHARE_QMI_SERVICE=y
CONFIG_MSM_PERFORMANCE=y
CONFIG_QMP_DEBUGFS_CLIENT=y
CONFIG_QCOM_SMP2P_SLEEPSTATE=y
CONFIG_QCOM_CDSP_RM=y
CONFIG_QCOM_CX_IPEAK=y
@@ -685,7 +688,6 @@ CONFIG_ESOC_CLIENT=y
CONFIG_ESOC_MDM_4x=y
CONFIG_ESOC_MDM_DRV=y
CONFIG_ESOC_MDM_DBG_ENG=y
CONFIG_MSM_TZ_LOG=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
@@ -718,37 +720,30 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_PAGE_OWNER=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=5
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_LIST=y
CONFIG_IPC_LOGGING=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
CONFIG_CORESIGHT_STM=y
CONFIG_CORESIGHT_CTI=y
CONFIG_CORESIGHT_TPDA=y
CONFIG_CORESIGHT_TPDM=y
CONFIG_CORESIGHT_HWEVENT=y
CONFIG_CORESIGHT_DUMMY=y
CONFIG_CORESIGHT_REMOTE_ETM=y
CONFIG_CORESIGHT_REMOTE_ETM_DEFAULT_ENABLE=0
CONFIG_CORESIGHT_EVENT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SMACK=y
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DEV_QCOM_MSM_QCE=y
CONFIG_CRYPTO_DEV_QCRYPTO=y
CONFIG_CRYPTO_DEV_QCEDEV=y

View File

@@ -40,6 +40,7 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y

View File

@@ -39,6 +39,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
@@ -57,6 +58,7 @@ CONFIG_BLK_INLINE_ENCRYPTION=y
CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_GKI_HACKS_TO_FIX=y
CONFIG_ARCH_QCOM=y
CONFIG_ARCH_SM8150=y
CONFIG_PCI=y
@@ -70,6 +72,7 @@ CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_MEMORY_HOTPLUG_MOVABLE_NODE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_CMA=y
CONFIG_ZSMALLOC=y
CONFIG_BALANCE_ANON_FILE_RECLAIM=y
@@ -106,6 +109,7 @@ CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_INTERFACE=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
CONFIG_INET=y
@@ -230,12 +234,14 @@ CONFIG_NET_SCHED=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS_FW=y
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_FLOW=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_MATCHALL=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_CMP=y
CONFIG_NET_EMATCH_NBYTE=y
@@ -243,9 +249,11 @@ CONFIG_NET_EMATCH_U32=y
CONFIG_NET_EMATCH_META=y
CONFIG_NET_EMATCH_TEXT=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=y
CONFIG_NET_ACT_MIRRED=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_ACT_BPF=y
CONFIG_NET_SWITCHDEV=y
CONFIG_QRTR=y
CONFIG_QRTR_SMD=y
@@ -269,7 +277,6 @@ CONFIG_NXP_P73_DEVICES=y
#endif /* OPLUS_FEATURE_NFC_BRINGUP */
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_REGMAP_ALLOW_WRITE_DEBUGFS=y
CONFIG_DMA_CMA=y
CONFIG_MHI_BUS=y
CONFIG_MHI_QCOM=y
@@ -283,7 +290,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_HDCP_QSEECOM=y
CONFIG_QSEECOM=y
CONFIG_UID_SYS_STATS=y
CONFIG_MEMORY_STATE_TIME=y
CONFIG_OKL4_USER_VIRQ=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
@@ -556,7 +562,6 @@ CONFIG_LEDS_QTI_TRI_LED=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_EDAC=y
CONFIG_EDAC_KRYO_ARM64=y
CONFIG_EDAC_KRYO_ARM64_PANIC_ON_UE=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_QPNP=y
CONFIG_DMADEVICES=y
@@ -578,7 +583,6 @@ CONFIG_RMNET_IPA3=y
CONFIG_RNDIS_IPA=y
CONFIG_IPA3_MHI_PROXY=y
CONFIG_IPA3_MHI_PRIME_MANAGER=y
CONFIG_IPA_UT=y
CONFIG_MSM_11AD=m
CONFIG_SEEMP_CORE=y
CONFIG_IPA3_REGDUMP=y
@@ -656,7 +660,6 @@ CONFIG_QSEE_IPC_IRQ_BRIDGE=y
CONFIG_QCOM_GLINK=y
CONFIG_QCOM_GLINK_PKT=y
CONFIG_QCOM_QDSS_BRIDGE=y
CONFIG_QTI_RPM_STATS_LOG=y
CONFIG_MSM_CDSP_LOADER=y
CONFIG_QCOM_SMCINVOKE=y
CONFIG_MSM_EVENT_TIMER=y
@@ -670,7 +673,6 @@ CONFIG_QCOM_MAX20328_I2C=y
CONFIG_MEM_SHARE_QMI_SERVICE=y
CONFIG_RMNET_CTL=y
CONFIG_MSM_PERFORMANCE=y
CONFIG_QMP_DEBUGFS_CLIENT=y
CONFIG_QCOM_SMP2P_SLEEPSTATE=y
CONFIG_QCOM_CDSP_RM=y
CONFIG_QCOM_AOP_DDR_MESSAGING=y
@@ -709,7 +711,6 @@ CONFIG_ESOC_CLIENT=y
CONFIG_ESOC_MDM_4x=y
CONFIG_ESOC_MDM_DRV=y
CONFIG_ESOC_MDM_DBG_ENG=y
CONFIG_MSM_TZ_LOG=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
@@ -740,38 +741,30 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_PAGE_OWNER=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=-1
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_LIST=y
CONFIG_IPC_LOGGING=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
CONFIG_CORESIGHT_STM=y
CONFIG_CORESIGHT_CTI=y
CONFIG_CORESIGHT_TPDA=y
CONFIG_CORESIGHT_TPDM=y
CONFIG_CORESIGHT_HWEVENT=y
CONFIG_CORESIGHT_DUMMY=y
CONFIG_CORESIGHT_REMOTE_ETM=y
CONFIG_CORESIGHT_REMOTE_ETM_DEFAULT_ENABLE=0
CONFIG_CORESIGHT_TGU=y
CONFIG_CORESIGHT_EVENT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SMACK=y
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DEV_QCOM_MSM_QCE=y
CONFIG_CRYPTO_DEV_QCRYPTO=y
CONFIG_CRYPTO_DEV_QCEDEV=y

View File

@@ -41,6 +41,7 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y

View File

@@ -39,7 +39,9 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
@@ -57,6 +59,7 @@ CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK=y
CONFIG_PARTITION_ADVANCED=y
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_CFQ_GROUP_IOSCHED=y
CONFIG_GKI_HACKS_TO_FIX=y
CONFIG_ARCH_QCOM=y
CONFIG_ARCH_TRINKET=y
CONFIG_PCI=y
@@ -69,8 +72,8 @@ CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_MEMORY_HOTPLUG_MOVABLE_NODE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=32768
CONFIG_CMA=y
CONFIG_CMA_DEBUGFS=y
CONFIG_ZSMALLOC=y
CONFIG_BALANCE_ANON_FILE_RECLAIM=y
CONFIG_SECCOMP=y
@@ -102,6 +105,7 @@ CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_XFRM_USER=y
CONFIG_XFRM_INTERFACE=y
CONFIG_XFRM_MIGRATE=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=y
CONFIG_INET=y
@@ -226,12 +230,14 @@ CONFIG_NET_SCHED=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_MULTIQ=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS_FW=y
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_FLOW=y
CONFIG_NET_CLS_BPF=y
CONFIG_NET_CLS_MATCHALL=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_CMP=y
CONFIG_NET_EMATCH_NBYTE=y
@@ -239,9 +245,11 @@ CONFIG_NET_EMATCH_U32=y
CONFIG_NET_EMATCH_META=y
CONFIG_NET_EMATCH_TEXT=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=y
CONFIG_NET_ACT_MIRRED=y
CONFIG_NET_ACT_SKBEDIT=y
CONFIG_NET_ACT_BPF=y
CONFIG_QRTR=y
CONFIG_QRTR_SMD=y
CONFIG_BPF_JIT=y
@@ -257,7 +265,6 @@ CONFIG_RFKILL=y
CONFIG_NFC_NQ=y
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_REGMAP_WCD_IRQ=y
CONFIG_REGMAP_ALLOW_WRITE_DEBUGFS=y
CONFIG_DMA_CMA=y
CONFIG_ZRAM=y
CONFIG_ZRAM_DEDUP=y
@@ -268,7 +275,6 @@ CONFIG_BLK_DEV_RAM_SIZE=8192
CONFIG_HDCP_QSEECOM=y
CONFIG_QSEECOM=y
CONFIG_UID_SYS_STATS=y
CONFIG_MEMORY_STATE_TIME=y
CONFIG_QPNP_MISC=y
CONFIG_FPR_FPC=y
CONFIG_SCSI=y
@@ -290,6 +296,7 @@ CONFIG_DM_SNAPSHOT=y
CONFIG_DM_UEVENT=y
CONFIG_DM_VERITY=y
CONFIG_DM_VERITY_FEC=y
CONFIG_DM_BOW=y
CONFIG_NETDEVICES=y
CONFIG_BONDING=y
CONFIG_DUMMY=y
@@ -461,6 +468,8 @@ CONFIG_HID_MICROSOFT=y
CONFIG_HID_MULTITOUCH=y
CONFIG_HID_NINTENDO=y
CONFIG_HID_PLANTRONICS=y
CONFIG_HID_PLAYSTATION=y
CONFIG_PLAYSTATION_FF=y
CONFIG_HID_SONY=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_XHCI_HCD=y
@@ -539,7 +548,6 @@ CONFIG_IPA3=y
CONFIG_IPA_WDI_UNIFIED_API=y
CONFIG_RMNET_IPA3=y
CONFIG_RNDIS_IPA=y
CONFIG_IPA_UT=y
CONFIG_MSM_11AD=m
CONFIG_QCOM_MDSS_PLL=y
CONFIG_QCOM_CLK_SMD_RPM=y
@@ -601,7 +609,6 @@ CONFIG_QCOM_EARLY_RANDOM=y
CONFIG_QSEE_IPC_IRQ=y
CONFIG_QCOM_GLINK=y
CONFIG_QCOM_GLINK_PKT=y
CONFIG_QTI_RPM_STATS_LOG=y
CONFIG_MSM_CDSP_LOADER=y
CONFIG_QCOM_SMCINVOKE=y
CONFIG_MSM_EVENT_TIMER=y
@@ -610,7 +617,6 @@ CONFIG_MSM_PM=y
CONFIG_QCOM_FSA4480_I2C=y
CONFIG_MEM_SHARE_QMI_SERVICE=y
CONFIG_MSM_PERFORMANCE=y
CONFIG_QMP_DEBUGFS_CLIENT=y
CONFIG_QCOM_SMP2P_SLEEPSTATE=y
CONFIG_QCOM_CDSP_RM=y
CONFIG_QCOM_CX_IPEAK=y
@@ -641,7 +647,6 @@ CONFIG_ANDROID_BINDERFS=y
CONFIG_QCOM_QFPROM=y
CONFIG_NVMEM_SPMI_SDAM=y
CONFIG_SENSORS_SSC=y
CONFIG_MSM_TZ_LOG=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
@@ -668,38 +673,31 @@ CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ISO8859_1=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
CONFIG_PAGE_OWNER=y
# CONFIG_SECTION_MISMATCH_WARN_ONLY is not set
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=5
CONFIG_SCHEDSTATS=y
CONFIG_SCHED_STACK_END_CHECK=y
# CONFIG_DEBUG_PREEMPT is not set
CONFIG_DEBUG_LIST=y
CONFIG_IPC_LOGGING=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_DEBUG_ALIGN_RODATA=y
CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_DYNAMIC_REPLICATOR=y
CONFIG_CORESIGHT_STM=y
CONFIG_CORESIGHT_CTI=y
CONFIG_CORESIGHT_TPDA=y
CONFIG_CORESIGHT_TPDM=y
CONFIG_CORESIGHT_HWEVENT=y
CONFIG_CORESIGHT_DUMMY=y
CONFIG_CORESIGHT_REMOTE_ETM=y
CONFIG_CORESIGHT_REMOTE_ETM_DEFAULT_ENABLE=0
CONFIG_CORESIGHT_EVENT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_STATIC_USERMODEHELPER=y
CONFIG_STATIC_USERMODEHELPER_PATH=""
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SMACK=y
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=y
CONFIG_CRYPTO_XCBC=y
CONFIG_CRYPTO_TWOFISH=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_DEV_QCOM_MSM_QCE=y
CONFIG_CRYPTO_DEV_QCRYPTO=y
CONFIG_CRYPTO_DEV_QCEDEV=y

View File

@@ -41,6 +41,7 @@ CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_KALLSYMS_ALL=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_USERFAULTFD=y
CONFIG_EMBEDDED=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB_FREELIST_RANDOM=y

View File

@@ -359,6 +359,7 @@ static inline int pmd_protnone(pmd_t pmd)
#define pmd_present(pmd) pte_present(pmd_pte(pmd))
#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
#define pmd_young(pmd) pte_young(pmd_pte(pmd))
#define pmd_valid(pmd) pte_valid(pmd_pte(pmd))
#define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
#define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd)))
@@ -383,6 +384,7 @@ static inline int pmd_protnone(pmd_t pmd)
#define pfn_pud(pfn,prot) (__pud(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
#define set_pmd_at(mm, addr, pmdp, pmd) set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd))
#define set_pud_at(mm, addr, pudp, pud) set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud))
#define __pgprot_modify(prot,mask,bits) \
__pgprot((pgprot_val(prot) & ~(mask)) | (bits))
@@ -423,8 +425,11 @@ static inline bool pud_table(pud_t pud) { return true; }
static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
{
*pmdp = pmd;
dsb(ishst);
isb();
if (pmd_valid(pmd)) {
dsb(ishst);
isb();
}
}
static inline void pmd_clear(pmd_t *pmdp)
@@ -476,12 +481,16 @@ static inline void pte_unmap(pte_t *pte) { }
#define pud_none(pud) (!pud_val(pud))
#define pud_bad(pud) (!(pud_val(pud) & PUD_TABLE_BIT))
#define pud_present(pud) pte_present(pud_pte(pud))
#define pud_valid(pud) pte_valid(pud_pte(pud))
static inline void set_pud(pud_t *pudp, pud_t pud)
{
*pudp = pud;
dsb(ishst);
isb();
if (pud_valid(pud)) {
dsb(ishst);
isb();
}
}
static inline void pud_clear(pud_t *pudp)

View File

@@ -22,52 +22,47 @@
#include <linux/pagemap.h>
#include <linux/swap.h>
#ifdef CONFIG_HAVE_RCU_TABLE_FREE
#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry)
static inline void __tlb_remove_table(void *_table)
{
free_page_and_swap_cache((struct page *)_table);
}
#else
#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry)
#endif /* CONFIG_HAVE_RCU_TABLE_FREE */
static void tlb_flush(struct mmu_gather *tlb);
#include <asm-generic/tlb.h>
static inline void tlb_flush(struct mmu_gather *tlb)
{
struct vm_area_struct vma = { .vm_mm = tlb->mm, };
bool last_level = !tlb->freed_tables;
unsigned long stride = tlb_get_unmap_size(tlb);
/*
* The ASID allocator will either invalidate the ASID or mark
* it as used.
* If we're tearing down the address space then we only care about
* invalidating the walk-cache, since the ASID allocator won't
* reallocate our ASID without invalidating the entire TLB.
*/
if (tlb->fullmm)
if (tlb->fullmm) {
if (!last_level)
flush_tlb_mm(tlb->mm);
return;
}
/*
* The intermediate page table levels are already handled by
* the __(pte|pmd|pud)_free_tlb() functions, so last level
* TLBI is sufficient here.
*/
__flush_tlb_range(&vma, tlb->start, tlb->end, true);
__flush_tlb_range(&vma, tlb->start, tlb->end, stride, last_level);
}
static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
unsigned long addr)
{
__flush_tlb_pgtable(tlb->mm, addr);
pgtable_page_dtor(pte);
tlb_remove_entry(tlb, pte);
tlb_remove_table(tlb, pte);
}
#if CONFIG_PGTABLE_LEVELS > 2
static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
unsigned long addr)
{
__flush_tlb_pgtable(tlb->mm, addr);
tlb_remove_entry(tlb, virt_to_page(pmdp));
tlb_remove_table(tlb, virt_to_page(pmdp));
}
#endif
@@ -75,8 +70,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
unsigned long addr)
{
__flush_tlb_pgtable(tlb->mm, addr);
tlb_remove_entry(tlb, virt_to_page(pudp));
tlb_remove_table(tlb, virt_to_page(pudp));
}
#endif

View File

@@ -70,43 +70,73 @@
})
/*
* TLB Management
* ==============
* TLB Invalidation
* ================
*
* The TLB specific code is expected to perform whatever tests it needs
* to determine if it should invalidate the TLB for each call. Start
* addresses are inclusive and end addresses are exclusive; it is safe to
* round these addresses down.
* This header file implements the low-level TLB invalidation routines
* (sometimes referred to as "flushing" in the kernel) for arm64.
*
* Every invalidation operation uses the following template:
*
* DSB ISHST // Ensure prior page-table updates have completed
* TLBI ... // Invalidate the TLB
* DSB ISH // Ensure the TLB invalidation has completed
* if (invalidated kernel mappings)
* ISB // Discard any instructions fetched from the old mapping
*
*
* The following functions form part of the "core" TLB invalidation API,
* as documented in Documentation/core-api/cachetlb.rst:
*
* flush_tlb_all()
*
* Invalidate the entire TLB.
* Invalidate the entire TLB (kernel + user) on all CPUs
*
* flush_tlb_mm(mm)
* Invalidate an entire user address space on all CPUs.
* The 'mm' argument identifies the ASID to invalidate.
*
* Invalidate all TLB entries in a particular address space.
* - mm - mm_struct describing address space
* flush_tlb_range(vma, start, end)
* Invalidate the virtual-address range '[start, end)' on all
* CPUs for the user address space corresponding to 'vma->mm'.
* Note that this operation also invalidates any walk-cache
* entries associated with translations for the specified address
* range.
*
* flush_tlb_range(mm,start,end)
* flush_tlb_kernel_range(start, end)
* Same as flush_tlb_range(..., start, end), but applies to
* kernel mappings rather than a particular user address space.
* Whilst not explicitly documented, this function is used when
* unmapping pages from vmalloc/io space.
*
* Invalidate a range of TLB entries in the specified address
* space.
* - mm - mm_struct describing address space
* - start - start address (may not be aligned)
* - end - end address (exclusive, may not be aligned)
* flush_tlb_page(vma, addr)
* Invalidate a single user mapping for address 'addr' in the
* address space corresponding to 'vma->mm'. Note that this
* operation only invalidates a single, last-level page-table
* entry and therefore does not affect any walk-caches.
*
* flush_tlb_page(vaddr,vma)
*
* Invalidate the specified page in the specified address range.
* - vaddr - virtual address (may not be aligned)
* - vma - vma_struct describing address range
* Next, we have some undocumented invalidation routines that you probably
* don't want to call unless you know what you're doing:
*
* flush_kern_tlb_page(kaddr)
* local_flush_tlb_all()
* Same as flush_tlb_all(), but only applies to the calling CPU.
*
* Invalidate the TLB entry for the specified page. The address
* will be in the kernels virtual memory space. Current uses
* only require the D-TLB to be invalidated.
* - kaddr - Kernel virtual memory address
* __flush_tlb_kernel_pgtable(addr)
* Invalidate a single kernel mapping for address 'addr' on all
* CPUs, ensuring that any walk-cache entries associated with the
* translation are also invalidated.
*
* __flush_tlb_range(vma, start, end, stride, last_level)
* Invalidate the virtual-address range '[start, end)' on all
* CPUs for the user address space corresponding to 'vma->mm'.
* The invalidation operations are issued at a granularity
* determined by 'stride' and only affect any walk-cache entries
* if 'last_level' is equal to false.
*
*
* Finally, take a look at asm/tlb.h to see how tlb_flush() is implemented
* on top of these routines, since that is our interface to the mmu_gather
* API as used by munmap() and friends.
*/
static inline void local_flush_tlb_all(void)
{
@@ -149,25 +179,28 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
* This is meant to avoid soft lock-ups on large TLB flushing ranges and not
* necessarily a performance improvement.
*/
#define MAX_TLB_RANGE (1024UL << PAGE_SHIFT)
#define MAX_TLBI_OPS 1024UL
static inline void __flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
bool last_level)
unsigned long stride, bool last_level)
{
unsigned long asid = ASID(vma->vm_mm);
unsigned long addr;
if ((end - start) > MAX_TLB_RANGE) {
if ((end - start) > (MAX_TLBI_OPS * stride)) {
flush_tlb_mm(vma->vm_mm);
return;
}
/* Convert the stride into units of 4k */
stride >>= 12;
start = __TLBI_VADDR(start, asid);
end = __TLBI_VADDR(end, asid);
dsb(ishst);
for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) {
for (addr = start; addr < end; addr += stride) {
if (last_level) {
__tlbi(vale1is, addr);
__tlbi_user(vale1is, addr);
@@ -182,14 +215,18 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
static inline void flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
__flush_tlb_range(vma, start, end, false);
/*
* We cannot use leaf-only invalidation here, since we may be invalidating
* table entries as part of collapsing hugepages or moving page tables.
*/
__flush_tlb_range(vma, start, end, PAGE_SIZE, false);
}
static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end)
{
unsigned long addr;
if ((end - start) > MAX_TLB_RANGE) {
if ((end - start) > (MAX_TLBI_OPS * PAGE_SIZE)) {
flush_tlb_all();
return;
}
@@ -199,7 +236,7 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
dsb(ishst);
for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
__tlbi(vaae1is, addr);
__tlbi(vaale1is, addr);
dsb(ish);
isb();
}
@@ -208,20 +245,11 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
* Used to invalidate the TLB (walk caches) corresponding to intermediate page
* table levels (pgd/pud/pmd).
*/
static inline void __flush_tlb_pgtable(struct mm_struct *mm,
unsigned long uaddr)
{
unsigned long addr = __TLBI_VADDR(uaddr, ASID(mm));
__tlbi(vae1is, addr);
__tlbi_user(vae1is, addr);
dsb(ish);
}
static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr)
{
unsigned long addr = __TLBI_VADDR(kaddr, 0);
dsb(ishst);
__tlbi(vaae1is, addr);
dsb(ish);
}

View File

@@ -13,21 +13,19 @@
#include <asm/insn.h>
#include <asm/probes.h>
#define MAX_UINSN_BYTES AARCH64_INSN_SIZE
#define UPROBE_SWBP_INSN BRK64_OPCODE_UPROBES
#define UPROBE_SWBP_INSN cpu_to_le32(BRK64_OPCODE_UPROBES)
#define UPROBE_SWBP_INSN_SIZE AARCH64_INSN_SIZE
#define UPROBE_XOL_SLOT_BYTES MAX_UINSN_BYTES
#define UPROBE_XOL_SLOT_BYTES AARCH64_INSN_SIZE
typedef u32 uprobe_opcode_t;
typedef __le32 uprobe_opcode_t;
struct arch_uprobe_task {
};
struct arch_uprobe {
union {
u8 insn[MAX_UINSN_BYTES];
u8 ixol[MAX_UINSN_BYTES];
__le32 insn;
__le32 ixol;
};
struct arch_probe_insn api;
bool simulate;

View File

@@ -104,10 +104,6 @@ arm_probe_decode_insn(probe_opcode_t insn, struct arch_probe_insn *api)
aarch64_insn_is_blr(insn) ||
aarch64_insn_is_ret(insn)) {
api->handler = simulate_br_blr_ret;
} else if (aarch64_insn_is_ldr_lit(insn)) {
api->handler = simulate_ldr_literal;
} else if (aarch64_insn_is_ldrsw_lit(insn)) {
api->handler = simulate_ldrsw_literal;
} else {
/*
* Instruction cannot be stepped out-of-line and we don't
@@ -145,6 +141,17 @@ arm_kprobe_decode_insn(kprobe_opcode_t *addr, struct arch_specific_insn *asi)
probe_opcode_t insn = le32_to_cpu(*addr);
probe_opcode_t *scan_end = NULL;
unsigned long size = 0, offset = 0;
struct arch_probe_insn *api = &asi->api;
if (aarch64_insn_is_ldr_lit(insn)) {
api->handler = simulate_ldr_literal;
decoded = INSN_GOOD_NO_SLOT;
} else if (aarch64_insn_is_ldrsw_lit(insn)) {
api->handler = simulate_ldrsw_literal;
decoded = INSN_GOOD_NO_SLOT;
} else {
decoded = arm_probe_decode_insn(insn, &asi->api);
}
/*
* If there's a symbol defined in front of and near enough to
@@ -162,7 +169,6 @@ arm_kprobe_decode_insn(kprobe_opcode_t *addr, struct arch_specific_insn *asi)
else
scan_end = addr - MAX_ATOMIC_CONTEXT_SIZE;
}
decoded = arm_probe_decode_insn(insn, &asi->api);
if (decoded != INSN_REJECTED && scan_end)
if (is_probed_address_atomic(addr - 1, scan_end))

View File

@@ -178,17 +178,15 @@ simulate_tbz_tbnz(u32 opcode, long addr, struct pt_regs *regs)
void __kprobes
simulate_ldr_literal(u32 opcode, long addr, struct pt_regs *regs)
{
u64 *load_addr;
unsigned long load_addr;
int xn = opcode & 0x1f;
int disp;
disp = ldr_displacement(opcode);
load_addr = (u64 *) (addr + disp);
load_addr = addr + ldr_displacement(opcode);
if (opcode & (1 << 30)) /* x0-x30 */
set_x_reg(regs, xn, *load_addr);
set_x_reg(regs, xn, READ_ONCE(*(u64 *)load_addr));
else /* w0-w30 */
set_w_reg(regs, xn, *load_addr);
set_w_reg(regs, xn, READ_ONCE(*(u32 *)load_addr));
instruction_pointer_set(regs, instruction_pointer(regs) + 4);
}
@@ -196,14 +194,12 @@ simulate_ldr_literal(u32 opcode, long addr, struct pt_regs *regs)
void __kprobes
simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs)
{
s32 *load_addr;
unsigned long load_addr;
int xn = opcode & 0x1f;
int disp;
disp = ldr_displacement(opcode);
load_addr = (s32 *) (addr + disp);
load_addr = addr + ldr_displacement(opcode);
set_x_reg(regs, xn, *load_addr);
set_x_reg(regs, xn, READ_ONCE(*(s32 *)load_addr));
instruction_pointer_set(regs, instruction_pointer(regs) + 4);
}

View File

@@ -45,7 +45,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
else if (!IS_ALIGNED(addr, AARCH64_INSN_SIZE))
return -EINVAL;
insn = *(probe_opcode_t *)(&auprobe->insn[0]);
insn = le32_to_cpu(auprobe->insn);
switch (arm_probe_decode_insn(insn, &auprobe->api)) {
case INSN_REJECTED:
@@ -111,7 +111,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
if (!auprobe->simulate)
return false;
insn = *(probe_opcode_t *)(&auprobe->insn[0]);
insn = le32_to_cpu(auprobe->insn);
addr = instruction_pointer(regs);
if (auprobe->api.handler)

View File

@@ -421,7 +421,7 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,
struct mm_struct *mm;
int fault, sig, code, major = 0;
unsigned long vm_flags = VM_READ | VM_WRITE | VM_EXEC;
unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int mm_flags = FAULT_FLAG_DEFAULT;
struct vm_area_struct *vma = NULL;
if (notify_page_fault(regs, esr))
@@ -497,25 +497,15 @@ retry:
fault = __do_page_fault(vma, addr, mm_flags, vm_flags, tsk);
major |= fault & VM_FAULT_MAJOR;
if (fault & VM_FAULT_RETRY) {
/*
* If we need to retry but a fatal signal is pending,
* handle the signal first. We do not need to release
* the mmap_sem because it would already be released
* in __lock_page_or_retry in mm/filemap.c.
*/
if (fatal_signal_pending(current)) {
if (!user_mode(regs))
goto no_context;
return 0;
}
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
goto no_context;
return 0;
}
/*
* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk of
* starvation.
*/
if (fault & VM_FAULT_RETRY) {
if (mm_flags & FAULT_FLAG_ALLOW_RETRY) {
mm_flags &= ~FAULT_FLAG_ALLOW_RETRY;
mm_flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -203,7 +203,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr,
set_pte(ptep, pte);
}
pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
@@ -231,9 +231,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
*/
pte = pte_alloc_map(mm, pmd, addr);
} else if (sz == PMD_SIZE) {
if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) &&
pud_none(*pud))
pte = huge_pmd_share(mm, addr, pud);
if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pud)))
ptep = huge_pmd_share(mm, vma, addr, pud);
else
pte = (pte_t *)pmd_alloc(mm, pud, addr);
} else if (sz == (PMD_SIZE * CONT_PMDS)) {

View File

@@ -54,7 +54,7 @@ void do_page_fault(unsigned long address, long cause, struct pt_regs *regs)
int si_code = SEGV_MAPERR;
int fault;
const struct exception_table_entry *fixup;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
/*
* If we're in an interrupt or have no user context,
@@ -104,7 +104,7 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
/* The most common case -- we are done. */
@@ -115,7 +115,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}

View File

@@ -88,7 +88,7 @@ ia64_do_page_fault (unsigned long address, unsigned long isr, struct pt_regs *re
struct siginfo si;
unsigned long mask;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
mask = ((((isr >> IA64_ISR_X_BIT) & 1UL) << VM_EXEC_BIT)
| (((isr >> IA64_ISR_W_BIT) & 1UL) << VM_WRITE_BIT));
@@ -164,7 +164,7 @@ retry:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -190,7 +190,6 @@ retry:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would

View File

@@ -26,7 +26,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT;
EXPORT_SYMBOL(hpage_shift);
pte_t *
huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
unsigned long taddr = htlbpage_to_page(addr);
pgd_t *pgd;

View File

@@ -73,7 +73,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
struct mm_struct *mm = current->mm;
struct vm_area_struct * vma;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
pr_debug("do page fault:\nregs->sr=%#x, regs->pc=%#lx, address=%#lx, %ld, %p\n",
regs->sr, regs->pc, address, error_code, mm ? mm->pgd : NULL);
@@ -140,7 +140,7 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
pr_debug("handle_mm_fault returns %d\n", fault);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return 0;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -164,9 +164,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -92,7 +92,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long address,
int code = SEGV_MAPERR;
int is_write = error_code & ESR_S;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
regs->ear = address;
regs->esr = error_code;
@@ -218,7 +218,7 @@ good_area:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -237,7 +237,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -327,11 +327,6 @@ asmlinkage void __init mmu_init(void)
{
unsigned int kstart, ksize;
if (!memblock.reserved.cnt) {
pr_emerg("Error memory count\n");
machine_restart(NULL);
}
if ((u32) memblock.memory.regions[0].size < 0x400000) {
pr_emerg("Memory must be greater than 4MB\n");
machine_restart(NULL);

View File

@@ -44,7 +44,7 @@ static void __kprobes __do_page_fault(struct pt_regs *regs, unsigned long write,
const int field = sizeof(unsigned long) * 2;
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
@@ -154,7 +154,7 @@ good_area:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
@@ -178,7 +178,6 @@ good_area:
tsk->min_flt++;
}
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -21,8 +21,8 @@
#include <asm/tlb.h>
#include <asm/tlbflush.h>
pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr,
unsigned long sz)
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
pud_t *pud;

View File

@@ -48,7 +48,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long cause,
struct mm_struct *mm = tsk->mm;
int code = SEGV_MAPERR;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
cause >>= 2;
@@ -134,7 +134,7 @@ good_area:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -158,9 +158,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -54,7 +54,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, unsigned long address,
struct vm_area_struct *vma;
siginfo_t info;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
tsk = current;
@@ -165,7 +165,7 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -185,7 +185,6 @@ good_area:
else
tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would

View File

@@ -273,7 +273,7 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
if (!mm)
goto no_context;
flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
flags = FAULT_FLAG_DEFAULT;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
@@ -303,7 +303,7 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -327,14 +327,12 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
/*
* No need to up_read(&mm->mmap_sem) as we would
* have already released it in __lock_page_or_retry
* in mm/filemap.c.
*/
flags |= FAULT_FLAG_TRIED;
goto retry;
}
}

View File

@@ -45,7 +45,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;

View File

@@ -402,7 +402,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
{
struct vm_area_struct * vma;
struct mm_struct *mm = current->mm;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
int is_exec = TRAP(regs) == 0x400;
int is_user = user_mode(regs);
int is_write = page_fault_is_write(error_code);
@@ -528,28 +528,18 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
major |= fault & VM_FAULT_MAJOR;
if (fault_signal_pending(fault, regs))
return user_mode(regs) ? 0 : SIGBUS;
/*
* Handle the retry right now, the mmap_sem has been released in that
* case.
*/
if (unlikely(fault & VM_FAULT_RETRY)) {
/* We retry only once */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
/*
* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation.
*/
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
if (!fatal_signal_pending(current))
goto retry;
goto retry;
}
/*
* User mode? Just return to handle the fatal exception otherwise
* return to bad_page_fault
*/
return is_user ? 0 : SIGBUS;
}
up_read(&current->mm->mmap_sem);

View File

@@ -134,7 +134,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
* At this point we do the placement change only for BOOK3S 64. This would
* possibly work on other subarchs.
*/
pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pg;
pud_t *pu;

View File

@@ -53,8 +53,10 @@ static inline int test_facility(unsigned long nr)
unsigned long facilities_als[] = { FACILITIES_ALS };
if (__builtin_constant_p(nr) && nr < sizeof(facilities_als) * 8) {
if (__test_facility(nr, &facilities_als))
return 1;
if (__test_facility(nr, &facilities_als)) {
if (!__is_defined(__DECOMPRESSOR))
return 1;
}
}
return __test_facility(nr, &S390_lowcore.stfle_fac_list);
}

View File

@@ -81,7 +81,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
vcpu->stat.diagnose_258++;
if (vcpu->run->s.regs.gprs[rx] & 7)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
rc = read_guest(vcpu, vcpu->run->s.regs.gprs[rx], rx, &parm, sizeof(parm));
rc = read_guest_real(vcpu, vcpu->run->s.regs.gprs[rx], &parm, sizeof(parm));
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
if (parm.parm_version != 2 || parm.parm_len < 5 || parm.code != 0x258)

View File

@@ -97,11 +97,12 @@ static long cmm_alloc_pages(long nr, long *counter,
(*counter)++;
spin_unlock(&cmm_lock);
nr--;
cond_resched();
}
return nr;
}
static long cmm_free_pages(long nr, long *counter, struct cmm_page_array **list)
static long __cmm_free_pages(long nr, long *counter, struct cmm_page_array **list)
{
struct cmm_page_array *pa;
unsigned long addr;
@@ -125,6 +126,21 @@ static long cmm_free_pages(long nr, long *counter, struct cmm_page_array **list)
return nr;
}
static long cmm_free_pages(long nr, long *counter, struct cmm_page_array **list)
{
long inc = 0;
while (nr) {
inc = min(256L, nr);
nr -= inc;
inc = __cmm_free_pages(inc, counter, list);
if (inc)
break;
cond_resched();
}
return nr + inc;
}
static int cmm_oom_notify(struct notifier_block *self,
unsigned long dummy, void *parm)
{

View File

@@ -430,7 +430,7 @@ static inline int do_exception(struct pt_regs *regs, int access)
address = trans_exc_code & __FAIL_ADDR_MASK;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
flags = FAULT_FLAG_DEFAULT;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400)
@@ -483,8 +483,7 @@ retry:
* the fault.
*/
fault = handle_mm_fault(vma, address, flags);
/* No reason to continue if interrupted by SIGKILL. */
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current)) {
if (fault_signal_pending(fault, regs)) {
fault = VM_FAULT_SIGNAL;
if (flags & FAULT_FLAG_RETRY_NOWAIT)
goto out_up;
@@ -518,10 +517,7 @@ retry:
goto out_up;
}
#endif
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~(FAULT_FLAG_ALLOW_RETRY |
FAULT_FLAG_RETRY_NOWAIT);
flags &= ~FAULT_FLAG_RETRY_NOWAIT;
flags |= FAULT_FLAG_TRIED;
down_read(&mm->mmap_sem);
goto retry;

View File

@@ -162,7 +162,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
return pte;
}
pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgdp;

View File

@@ -326,25 +326,25 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
* Pagefault was interrupted by SIGKILL. We have no reason to
* continue pagefault.
*/
if (fatal_signal_pending(current)) {
if (!(fault & VM_FAULT_RETRY))
up_read(&current->mm->mmap_sem);
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
no_context(regs, error_code, address);
return 1;
}
/* Release mmap_sem first if necessary */
if (!(fault & VM_FAULT_RETRY))
up_read(&current->mm->mmap_sem);
if (!(fault & VM_FAULT_ERROR))
return 0;
if (fault & VM_FAULT_OOM) {
/* Kernel mode? Handle exceptions or die: */
if (!user_mode(regs)) {
up_read(&current->mm->mmap_sem);
no_context(regs, error_code, address);
return 1;
}
up_read(&current->mm->mmap_sem);
/*
* We ran out of memory, call the OOM killer, and return the
@@ -404,7 +404,7 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
struct mm_struct *mm;
struct vm_area_struct * vma;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
tsk = current;
mm = tsk->mm;
@@ -505,7 +505,6 @@ good_area:
regs, address);
}
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/*

View File

@@ -22,7 +22,7 @@
#include <asm/tlbflush.h>
#include <asm/cacheflush.h>
pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;

View File

@@ -175,7 +175,7 @@ asmlinkage void do_sparc_fault(struct pt_regs *regs, int text_fault, int write,
unsigned long g2;
int from_user = !(regs->psr & PSR_PS);
int fault, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
if (text_fault)
address = regs->pc;
@@ -244,7 +244,7 @@ good_area:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -268,7 +268,6 @@ good_area:
1, regs, address);
}
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would

View File

@@ -286,7 +286,7 @@ asmlinkage void __kprobes do_sparc64_fault(struct pt_regs *regs)
unsigned int insn = 0;
int si_code, fault_code, fault;
unsigned long address, mm_rss;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
fault_code = get_thread_fault_code();
@@ -440,7 +440,7 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
goto exit_exception;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -464,7 +464,6 @@ good_area:
1, regs, address);
}
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would

View File

@@ -261,7 +261,7 @@ static unsigned long huge_tte_to_size(pte_t pte)
return size;
}
pte_t *huge_pte_alloc(struct mm_struct *mm,
pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;

View File

@@ -32,7 +32,7 @@ int handle_page_fault(unsigned long address, unsigned long ip,
pmd_t *pmd;
pte_t *pte;
int err = -EFAULT;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
*code_out = SEGV_MAPERR;
@@ -96,7 +96,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;

View File

@@ -209,7 +209,7 @@ static int do_pf(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
struct task_struct *tsk;
struct mm_struct *mm;
int fault, sig, code;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
tsk = current;
mm = tsk->mm;
@@ -257,7 +257,7 @@ retry:
* signal first. We do not need to release the mmap_sem because
* it would already be released in __lock_page_or_retry in
* mm/filemap.c. */
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return 0;
if (!(fault & VM_FAULT_ERROR) && (flags & FAULT_FLAG_ALLOW_RETRY)) {
@@ -266,9 +266,7 @@ retry:
else
tsk->min_flt++;
if (fault & VM_FAULT_RETRY) {
/* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation. */
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
goto retry;
}
}

View File

@@ -118,6 +118,7 @@ config X86
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
select HAVE_ARCH_USERFAULTFD_MINOR if X86_64 && USERFAULTFD
select HAVE_ARCH_VMAP_STACK if X86_64
select HAVE_ARCH_WITHIN_STACK_FRAMES
select HAVE_CC_STACKPROTECTOR
@@ -160,6 +161,8 @@ config X86
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_MIXED_BREAKPOINTS_REGS
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_MOVE_PMD
select HAVE_MOVE_PUD
select HAVE_NMI
select HAVE_OPROFILE
select HAVE_OPTPROBES

View File

@@ -216,7 +216,7 @@
#define X86_FEATURE_SPEC_STORE_BYPASS_DISABLE ( 7*32+23) /* "" Disable Speculative Store Bypass. */
#define X86_FEATURE_LS_CFG_SSBD ( 7*32+24) /* "" AMD SSBD implementation via LS_CFG MSR */
#define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */
#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */
#define X86_FEATURE_IBPB ( 7*32+26) /* "ibpb" Indirect Branch Prediction Barrier without a guaranteed RSB flush */
#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */
#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
#define X86_FEATURE_L1TF_PTEINV ( 7*32+29) /* "" L1TF workaround PTE inversion */
@@ -295,6 +295,8 @@
#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
#define X86_FEATURE_BTC_NO (13*32+29) /* "" Not vulnerable to Branch Type Confusion */
#define X86_FEATURE_AMD_IBPB_RET (13*32+30) /* "" IBPB clears return address predictor */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */

View File

@@ -489,7 +489,19 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
apic_write(APIC_TMICT, 0);
/*
* Setting APIC_LVT_MASKED (above) should be enough to tell
* the hardware that this timer will never fire. But AMD
* erratum 411 and some Intel CPU behavior circa 2024 say
* otherwise. Time for belt and suspenders programming: mask
* the timer _and_ zero the counter registers:
*/
if (v & APIC_LVT_TIMER_TSCDEADLINE)
wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
else
apic_write(APIC_TMICT, 0);
return 0;
}

View File

@@ -206,6 +206,7 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.misc_features & HV_FEATURE_FREQUENCY_MSRS_AVAILABLE) {
x86_platform.calibrate_tsc = hv_get_tsc_khz;
x86_platform.calibrate_cpu = hv_get_tsc_khz;
setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
}
#ifdef CONFIG_X86_LOCAL_APIC

View File

@@ -1262,7 +1262,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
struct task_struct *tsk;
struct mm_struct *mm;
int fault, major = 0;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
u32 pkey;
tsk = current;
@@ -1442,27 +1442,23 @@ good_area:
fault = handle_mm_fault(vma, address, flags);
major |= fault & VM_FAULT_MAJOR;
/* Quick path to respond to signals */
if (fault_signal_pending(fault, regs)) {
if (!user_mode(regs))
no_context(regs, error_code, address, SIGBUS,
BUS_ADRERR);
return;
}
/*
* If we need to retry the mmap_sem has already been released,
* and if there is a fatal signal pending there is no guarantee
* that we made any progress. Handle this case first.
*/
if (unlikely(fault & VM_FAULT_RETRY)) {
/* Retry at most once */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
if (!fatal_signal_pending(tsk))
goto retry;
}
/* User mode? Just return to handle the fatal exception */
if (flags & FAULT_FLAG_USER)
return;
/* Not returning to user mode? Handle exceptions or die: */
no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
return;
if (unlikely((fault & VM_FAULT_RETRY) &&
(flags & FAULT_FLAG_ALLOW_RETRY))) {
flags |= FAULT_FLAG_TRIED;
goto retry;
}
up_read(&mm->mmap_sem);

View File

@@ -862,7 +862,7 @@ char * __init xen_memory_setup(void)
* to relocating (and even reusing) pages with kernel text or data.
*/
if (xen_is_e820_reserved(__pa_symbol(_text),
__pa_symbol(__bss_stop) - __pa_symbol(_text))) {
__pa_symbol(_end) - __pa_symbol(_text))) {
xen_raw_console_write("Xen hypervisor allocated kernel memory conflicts with E820 map\n");
BUG();
}

View File

@@ -45,7 +45,7 @@ void do_page_fault(struct pt_regs *regs)
int is_write, is_exec;
int fault;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
unsigned int flags = FAULT_FLAG_DEFAULT;
info.si_code = SEGV_MAPERR;
@@ -112,7 +112,7 @@ good_area:
*/
fault = handle_mm_fault(vma, address, flags);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
if (fault_signal_pending(fault, regs))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
@@ -130,7 +130,6 @@ good_area:
else
current->min_flt++;
if (fault & VM_FAULT_RETRY) {
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
/* No need to up_read(&mm->mmap_sem) as we would

View File

@@ -57,6 +57,9 @@ choice
config DEFAULT_NOOP
bool "No-op"
config DEFAULT_MQ_DEADLINE
bool "MQ deadline" if MQ_IOSCHED_DEADLINE=y
endchoice
config DEFAULT_IOSCHED
@@ -64,6 +67,7 @@ config DEFAULT_IOSCHED
default "deadline" if DEFAULT_DEADLINE
default "cfq" if DEFAULT_CFQ
default "noop" if DEFAULT_NOOP
default "mq-deadline" if DEFAULT_MQ_DEADLINE
config MQ_IOSCHED_DEADLINE
tristate "MQ deadline I/O scheduler"

View File

@@ -4403,7 +4403,7 @@ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
{
bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
if (bfqq_process_refs(bfqq) == 1) {
if (bfqq_process_refs(bfqq) == 1 && !bfqq->new_bfqq) {
bfqq->pid = current->pid;
bfq_clear_bfqq_coop(bfqq);
bfq_clear_bfqq_split_coop(bfqq);
@@ -4522,7 +4522,8 @@ static void bfq_prepare_request(struct request *rq, struct bio *bio)
* addition, if the queue has also just been split, we have to
* resume its state.
*/
if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
if (likely(bfqq != &bfqd->oom_bfqq) && !bfqq->new_bfqq &&
bfqq_process_refs(bfqq) == 1) {
bfqq->bic = bic;
if (split) {
/*

View File

@@ -14,21 +14,21 @@
#include "blk.h"
/* Max dispatch from a group in 1 round */
static int throtl_grp_quantum = 8;
static int throtl_grp_quantum = 16;
/* Total max dispatch from all groups in one round */
static int throtl_quantum = 32;
static int throtl_quantum = 64;
/* Throttling is performed over a slice and after that slice is renewed */
#define DFL_THROTL_SLICE_HD (HZ / 10)
#define DFL_THROTL_SLICE_SSD (HZ / 50)
#define DFL_THROTL_SLICE_SSD (HZ / 100)
#define MAX_THROTL_SLICE (HZ)
#define MAX_IDLE_TIME (5L * 1000 * 1000) /* 5 s */
#define MIN_THROTL_BPS (320 * 1024)
#define MIN_THROTL_IOPS (10)
#define DFL_LATENCY_TARGET (-1L)
#define DFL_IDLE_THRESHOLD (0)
#define DFL_HD_BASELINE_LATENCY (4000L) /* 4ms */
#define DFL_HD_BASELINE_LATENCY (2000L) /* 2ms */
#define LATENCY_FILTERED_SSD (0)
/*
* For HD, very small latency comes from sequential IO. Such IO is helpless to

View File

@@ -27,8 +27,8 @@
* See Documentation/block/deadline-iosched.txt
*/
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
static const int writes_starved = 2; /* max times reads can starve a write */
static const int write_expire = HZ; /* ditto for writes, these limits are SOFT! */
static const int writes_starved = 1; /* max times reads can starve a write */
static const int fifo_batch = 16; /* # of sequential requests treated as one
by the above parameters. For throughput. */
@@ -336,7 +336,7 @@ static int dd_init_queue(struct request_queue *q, struct elevator_type *e)
dd->fifo_expire[READ] = read_expire;
dd->fifo_expire[WRITE] = write_expire;
dd->writes_starved = writes_starved;
dd->front_merges = 1;
dd->front_merges = 0;
dd->fifo_batch = fifo_batch;
spin_lock_init(&dd->lock);
INIT_LIST_HEAD(&dd->dispatch);

133
build.sh Normal file
View File

@@ -0,0 +1,133 @@
#!/bin/bash
# ============================================================
# Pre-build checks for required toolchains and AnyKernel3
# ============================================================
# Paths
CLANG_DIR=~/toolchains/clang
GCC_DIR=~/toolchains/gcc-aarch64-linux-gnu-9.3
ANYKERNEL_DIR=~/AnyKernel3
# Check Clang
if [ ! -d "$CLANG_DIR" ]; then
echo -e "\n🔍 \033[1;33mClang toolchain not found. Cloning...\033[0m"
git clone --depth=1 --branch lineage-20.0 \
https://github.com/LineageOS/android_prebuilts_clang_kernel_linux-x86_clang-r416183b.git "$CLANG_DIR"
else
echo -e "\n✅ \033[1;32mClang toolchain already present.\033[0m"
fi
# Check GCC
if [ ! -d "$GCC_DIR" ]; then
echo -e "\n🔍 \033[1;33mGCC toolchain not found. Cloning...\033[0m"
git clone --depth=1 --branch lineage-23.0 \
https://github.com/LineageOS/android_prebuilts_gcc_linux-x86_aarch64_aarch64-linux-gnu-9.3.git "$GCC_DIR"
else
echo -e "\n✅ \033[1;32mGCC toolchain already present.\033[0m"
fi
# Check AnyKernel3
if [ ! -d "$ANYKERNEL_DIR" ]; then
echo -e "\n🔍 \033[1;33mAnyKernel3 not found. Cloning...\033[0m"
git clone --depth=1 https://github.com/theshaenix/AnyKernel3.git "$ANYKERNEL_DIR"
else
echo -e "\n✅ \033[1;32mAnyKernel3 folder already present.\033[0m"
fi
# ============================================================
# Build Script
# ============================================================
# Kernel build configuration
KERNEL_NAME="ShadowBladeX"
DEVICE="RMX2061"
VARIANT="perf"
BUILD_TYPE="nonKSU"
VERSION_NUMBER="v1.0.3"
DATE=$(date +%Y%m%d)
TIME=$(date +%H%M)
BASE_ZIPNAME="${KERNEL_NAME}-${VARIANT}-${DEVICE}-${BUILD_TYPE}-${TIME}-${DATE}-${VERSION_NUMBER}"
ZIPNAME="${BASE_ZIPNAME}.zip"
# Paths
export KERNEL_ROOT="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
export CLANG_PATH=$CLANG_DIR
export GCC_PATH=$GCC_DIR
export ANYKERNEL_DIR=$ANYKERNEL_DIR
export OUT_DIR=out
export PATH=$CLANG_PATH/bin:$GCC_PATH/bin:$PATH
export ARCH=arm64
export CLANG_TRIPLE=aarch64-linux-gnu-
export CROSS_COMPILE=aarch64-linux-
# =====================[ START PROCESS ]=====================
echo -e "\n🛠 \033[1;34mStarting Kernel Build: $BASE_ZIPNAME\033[0m"
echo -e "\n🧹 \033[1;33mCleaning output and ccache...\033[0m"
rm -rf $OUT_DIR/*
rm -f "$ANYKERNEL_DIR/zImage"
ccache -C > /dev/null 2>&1
echo -e "\n🔧 \033[1;36mCompiler Info:\033[0m"
clang --version | head -n 1
aarch64-linux-gcc --version | head -n 1
echo -e "\n📱 \033[1;32mTarget Device: $DEVICE\033[0m"
# =====================[ DEFCONFIG ]=====================
echo -e "\n📄 \033[1;36mSetting up defconfig...\033[0m"
make O=$OUT_DIR ARCH=arm64 atoll_defconfig
if [ $? -ne 0 ]; then
echo -e "\n❌ \033[1;31mDefconfig failed. Exiting.\033[0m"
exit 1
fi
# =====================[ COMPILING ]=====================
echo -e "\n🚀 \033[1;35mStarting compilation...\033[0m"
make -j$(nproc) O=$OUT_DIR \
ARCH=arm64 \
CC=clang \
LD=ld.lld \
AR=llvm-ar \
NM=llvm-nm \
OBJCOPY=llvm-objcopy \
OBJDUMP=llvm-objdump \
STRIP=llvm-strip \
CLANG_TRIPLE=$CLANG_TRIPLE \
CROSS_COMPILE=$CROSS_COMPILE \
2>&1 | tee out/build.log | grep --line-buffered -E "warning:|error:" | sed \
-e 's/warning:/\x1b[1;33mwarning:\x1b[0m/g' \
-e 's/error:/\x1b[1;31merror:\x1b[0m/g'
# =====================[ CHECK IMAGE ]=====================
KERNEL_IMG=$OUT_DIR/arch/arm64/boot/Image.gz-dtb
if [ ! -f "$KERNEL_IMG" ]; then
echo -e "\n❌ \033[1;31mBuild failed: Image.gz-dtb not found!\033[0m"
exit 1
fi
echo -e "\n✅ \033[1;32mKernel image compiled successfully.\033[0m"
# =====================[ PACKAGING ]=====================
echo -e "\n📦 \033[1;34mPacking kernel into flashable zip...\033[0m"
cp "$KERNEL_IMG" "$ANYKERNEL_DIR/zImage"
cd $ANYKERNEL_DIR || exit 1
zip -r9 "$ZIPNAME" * -x "*.zip" "*.git*" README.md > /dev/null
if [ $? -eq 0 ]; then
echo -e "\n🎉 \033[1;32mFlashable zip created: $ANYKERNEL_DIR/$ZIPNAME\033[0m"
else
echo -e "\n❌ \033[1;31mFailed to create zip.\033[0m"
exit 1
fi

View File

@@ -45,8 +45,7 @@ static int setkey_unaligned(struct crypto_aead *tfm, const u8 *key,
alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1);
memcpy(alignbuffer, key, keylen);
ret = crypto_aead_alg(tfm)->setkey(tfm, alignbuffer, keylen);
memset(alignbuffer, 0, keylen);
kfree(buffer);
kzfree(buffer);
return ret;
}

View File

@@ -37,8 +37,7 @@ static int setkey_unaligned(struct crypto_tfm *tfm, const u8 *key,
alignbuffer = (u8 *)ALIGN((unsigned long)buffer, alignmask + 1);
memcpy(alignbuffer, key, keylen);
ret = cia->cia_setkey(tfm, alignbuffer, keylen);
memset(alignbuffer, 0, keylen);
kfree(buffer);
kzfree(buffer);
return ret;
}

View File

@@ -191,3 +191,5 @@ obj-$(CONFIG_ESOC) += esoc/
# GNSS driver
obj-$(CONFIG_GNSS_SIRF) += gnsssirf/
obj-$(CONFIG_GNSS) += gnss/
obj-$(CONFIG_KSU) += kernelsu/

View File

@@ -206,6 +206,8 @@ acpi_status acpi_db_convert_to_package(char *string, union acpi_object *object)
elements =
ACPI_ALLOCATE_ZEROED(DB_DEFAULT_PKG_ELEMENTS *
sizeof(union acpi_object));
if (!elements)
return (AE_NO_MEMORY);
this = string;
for (i = 0; i < (DB_DEFAULT_PKG_ELEMENTS - 1); i++) {

View File

@@ -471,6 +471,9 @@ acpi_status acpi_ex_prep_field_value(struct acpi_create_field_info *info)
if (info->connection_node) {
second_desc = info->connection_node->object;
if (second_desc == NULL) {
break;
}
if (!(second_desc->common.flags & AOPOBJ_DATA_VALID)) {
status =
acpi_ds_get_buffer_arguments(second_desc);

View File

@@ -59,6 +59,8 @@ acpi_ps_get_next_package_length(struct acpi_parse_state *parser_state);
static union acpi_parse_object *acpi_ps_get_next_field(struct acpi_parse_state
*parser_state);
static void acpi_ps_free_field_list(union acpi_parse_object *start);
/*******************************************************************************
*
* FUNCTION: acpi_ps_get_next_package_length
@@ -717,6 +719,39 @@ static union acpi_parse_object *acpi_ps_get_next_field(struct acpi_parse_state
return_PTR(field);
}
/*******************************************************************************
*
* FUNCTION: acpi_ps_free_field_list
*
* PARAMETERS: start - First Op in field list
*
* RETURN: None.
*
* DESCRIPTION: Free all Op objects inside a field list.
*
******************************************************************************/
static void acpi_ps_free_field_list(union acpi_parse_object *start)
{
union acpi_parse_object *cur = start;
union acpi_parse_object *next;
union acpi_parse_object *arg;
while (cur) {
next = cur->common.next;
/* AML_INT_CONNECTION_OP can have a single argument */
arg = acpi_ps_get_arg(cur, 0);
if (arg) {
acpi_ps_free_op(arg);
}
acpi_ps_free_op(cur);
cur = next;
}
}
/*******************************************************************************
*
* FUNCTION: acpi_ps_get_next_arg
@@ -785,6 +820,10 @@ acpi_ps_get_next_arg(struct acpi_walk_state *walk_state,
while (parser_state->aml < parser_state->pkg_end) {
field = acpi_ps_get_next_field(parser_state);
if (!field) {
if (arg) {
acpi_ps_free_field_list(arg);
}
return_ACPI_STATUS(AE_NO_MEMORY);
}
@@ -854,6 +893,10 @@ acpi_ps_get_next_arg(struct acpi_walk_state *walk_state,
acpi_ps_get_next_namepath(walk_state, parser_state,
arg,
ACPI_NOT_METHOD_CALL);
if (ACPI_FAILURE(status)) {
acpi_ps_free_op(arg);
return_ACPI_STATUS(status);
}
} else {
/* Single complex argument, nothing returned */
@@ -888,6 +931,10 @@ acpi_ps_get_next_arg(struct acpi_walk_state *walk_state,
acpi_ps_get_next_namepath(walk_state, parser_state,
arg,
ACPI_POSSIBLE_METHOD_CALL);
if (ACPI_FAILURE(status)) {
acpi_ps_free_op(arg);
return_ACPI_STATUS(status);
}
if (arg->common.aml_opcode == AML_INT_METHODCALL_OP) {

View File

@@ -545,8 +545,9 @@ int acpi_device_setup_files(struct acpi_device *dev)
* If device has _STR, 'description' file is created
*/
if (acpi_has_method(dev->handle, "_STR")) {
status = acpi_evaluate_object(dev->handle, "_STR",
NULL, &buffer);
status = acpi_evaluate_object_typed(dev->handle, "_STR",
NULL, &buffer,
ACPI_TYPE_BUFFER);
if (ACPI_FAILURE(status))
buffer.pointer = NULL;
dev->pnp.str_obj = buffer.pointer;

View File

@@ -807,6 +807,9 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
unsigned long tmp;
int ret = 0;
if (t->rdata)
memset(t->rdata, 0, t->rlen);
/* start transaction */
spin_lock_irqsave(&ec->lock, tmp);
/* Enable GPE for command processing (IBF=0/OBF=1) */
@@ -843,8 +846,6 @@ static int acpi_ec_transaction(struct acpi_ec *ec, struct transaction *t)
if (!ec || (!t) || (t->wlen && !t->wdata) || (t->rlen && !t->rdata))
return -EINVAL;
if (t->rdata)
memset(t->rdata, 0, t->rlen);
mutex_lock(&ec->mutex);
if (ec->global_lock) {
@@ -871,7 +872,7 @@ static int acpi_ec_burst_enable(struct acpi_ec *ec)
.wdata = NULL, .rdata = &d,
.wlen = 0, .rlen = 1};
return acpi_ec_transaction(ec, &t);
return acpi_ec_transaction_unlocked(ec, &t);
}
static int acpi_ec_burst_disable(struct acpi_ec *ec)
@@ -881,7 +882,7 @@ static int acpi_ec_burst_disable(struct acpi_ec *ec)
.wlen = 0, .rlen = 0};
return (acpi_ec_read_status(ec) & ACPI_EC_FLAG_BURST) ?
acpi_ec_transaction(ec, &t) : 0;
acpi_ec_transaction_unlocked(ec, &t) : 0;
}
static int acpi_ec_read(struct acpi_ec *ec, u8 address, u8 *data)
@@ -897,6 +898,19 @@ static int acpi_ec_read(struct acpi_ec *ec, u8 address, u8 *data)
return result;
}
static int acpi_ec_read_unlocked(struct acpi_ec *ec, u8 address, u8 *data)
{
int result;
u8 d;
struct transaction t = {.command = ACPI_EC_COMMAND_READ,
.wdata = &address, .rdata = &d,
.wlen = 1, .rlen = 1};
result = acpi_ec_transaction_unlocked(ec, &t);
*data = d;
return result;
}
static int acpi_ec_write(struct acpi_ec *ec, u8 address, u8 data)
{
u8 wdata[2] = { address, data };
@@ -907,6 +921,16 @@ static int acpi_ec_write(struct acpi_ec *ec, u8 address, u8 data)
return acpi_ec_transaction(ec, &t);
}
static int acpi_ec_write_unlocked(struct acpi_ec *ec, u8 address, u8 data)
{
u8 wdata[2] = { address, data };
struct transaction t = {.command = ACPI_EC_COMMAND_WRITE,
.wdata = wdata, .rdata = NULL,
.wlen = 2, .rlen = 0};
return acpi_ec_transaction_unlocked(ec, &t);
}
int ec_read(u8 addr, u8 *val)
{
int err;
@@ -1301,6 +1325,7 @@ acpi_ec_space_handler(u32 function, acpi_physical_address address,
struct acpi_ec *ec = handler_context;
int result = 0, i, bytes = bits / 8;
u8 *value = (u8 *)value64;
u32 glk;
if ((address > 0xFF) || !value || !handler_context)
return AE_BAD_PARAMETER;
@@ -1308,17 +1333,38 @@ acpi_ec_space_handler(u32 function, acpi_physical_address address,
if (function != ACPI_READ && function != ACPI_WRITE)
return AE_BAD_PARAMETER;
mutex_lock(&ec->mutex);
if (ec->global_lock) {
acpi_status status;
status = acpi_acquire_global_lock(ACPI_EC_UDELAY_GLK, &glk);
if (ACPI_FAILURE(status)) {
result = -ENODEV;
goto unlock;
}
}
if (ec->busy_polling || bits > 8)
acpi_ec_burst_enable(ec);
for (i = 0; i < bytes; ++i, ++address, ++value)
result = (function == ACPI_READ) ?
acpi_ec_read(ec, address, value) :
acpi_ec_write(ec, address, *value);
acpi_ec_read_unlocked(ec, address, value) :
acpi_ec_write_unlocked(ec, address, *value);
if (result < 0)
break;
}
if (ec->busy_polling || bits > 8)
acpi_ec_burst_disable(ec);
if (ec->global_lock)
acpi_release_global_lock(glk);
unlock:
mutex_unlock(&ec->mutex);
switch (result) {
case -EINVAL:
return AE_BAD_PARAMETER;

View File

@@ -54,6 +54,39 @@ config ANDROID_BINDER_IPC_SELFTEST
exhaustively with combinations of various buffer sizes and
alignments.
config ANDROID_SIMPLE_LMK
bool "Simple Android Low Memory Killer"
depends on !ANDROID_LOW_MEMORY_KILLER && !MEMCG && !PSI
---help---
This is a complete low memory killer solution for Android that is
small and simple. Processes are killed according to the priorities
that Android gives them, so that the least important processes are
always killed first. Processes are killed until memory deficits are
satisfied, as observed from direct reclaim and kswapd reclaim
struggling to free up pages, via VM pressure notifications.
if ANDROID_SIMPLE_LMK
config ANDROID_SIMPLE_LMK_MINFREE
int "Minimum MiB of memory to free per reclaim"
range 8 512
default 128
help
Simple LMK will try to free at least this much memory per reclaim.
config ANDROID_SIMPLE_LMK_TIMEOUT_MSEC
int "Reclaim timeout in milliseconds"
range 50 1000
default 200
help
Simple LMK tries to wait until all of the victims it kills have their
memory freed; however, sometimes victims can take a while to die,
which can block Simple LMK from killing more processes in time when
needed. After the specified timeout elapses, Simple LMK will stop
waiting and make itself available to kill more processes.
endif
endif # if ANDROID
endmenu

View File

@@ -3,3 +3,4 @@ ccflags-y += -I$(src) # needed for trace events
obj-$(CONFIG_ANDROID_BINDERFS) += binderfs.o
obj-$(CONFIG_ANDROID_BINDER_IPC) += binder.o binder_alloc.o
obj-$(CONFIG_ANDROID_BINDER_IPC_SELFTEST) += binder_alloc_selftest.o
obj-$(CONFIG_ANDROID_SIMPLE_LMK) += simple_lmk.o

View File

@@ -0,0 +1,508 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2019-2023 Sultan Alsawaf <sultan@kerneltoast.com>.
*/
#define pr_fmt(fmt) "simple_lmk: " fmt
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/mm.h>
#include <linux/mmu_notifier.h>
#include <linux/moduleparam.h>
#include <linux/oom.h>
#include <linux/ratelimit.h>
#include <linux/sched/mm.h>
#include <linux/sort.h>
#include <linux/vmpressure.h>
#include <uapi/linux/sched/types.h>
/* The minimum number of pages to free per reclaim */
#define MIN_FREE_PAGES (CONFIG_ANDROID_SIMPLE_LMK_MINFREE * SZ_1M / PAGE_SIZE)
/* Kill up to this many victims per reclaim */
#define MAX_VICTIMS 1024
/* Timeout in jiffies for each reclaim */
#define RECLAIM_EXPIRES msecs_to_jiffies(CONFIG_ANDROID_SIMPLE_LMK_TIMEOUT_MSEC)
struct victim_info {
struct task_struct *tsk;
struct mm_struct *mm;
unsigned long size;
};
static struct victim_info victims[MAX_VICTIMS] __cacheline_aligned_in_smp;
static struct task_struct *task_bucket[SHRT_MAX + 1] __cacheline_aligned;
static DECLARE_WAIT_QUEUE_HEAD(oom_waitq);
static DECLARE_WAIT_QUEUE_HEAD(reaper_waitq);
static DECLARE_COMPLETION(reclaim_done);
static __cacheline_aligned_in_smp DEFINE_RWLOCK(mm_free_lock);
static int nr_victims;
static bool reclaim_active;
static atomic_t needs_reclaim = ATOMIC_INIT(0);
static atomic_t needs_reap = ATOMIC_INIT(0);
static atomic_t nr_killed = ATOMIC_INIT(0);
static int victim_cmp(const void *lhs_ptr, const void *rhs_ptr)
{
const struct victim_info *lhs = (typeof(lhs))lhs_ptr;
const struct victim_info *rhs = (typeof(rhs))rhs_ptr;
return rhs->size - lhs->size;
}
static void victim_swap(void *lhs_ptr, void *rhs_ptr, int size)
{
struct victim_info *lhs = (typeof(lhs))lhs_ptr;
struct victim_info *rhs = (typeof(rhs))rhs_ptr;
swap(*lhs, *rhs);
}
static unsigned long get_total_mm_pages(struct mm_struct *mm)
{
unsigned long pages = 0;
int i;
for (i = 0; i < NR_MM_COUNTERS; i++)
pages += get_mm_counter(mm, i);
return pages;
}
static unsigned long find_victims(int *vindex)
{
short i, min_adj = SHRT_MAX, max_adj = 0;
unsigned long pages_found = 0;
struct task_struct *tsk;
rcu_read_lock();
for_each_process(tsk) {
struct signal_struct *sig;
short adj;
/*
* Search for suitable tasks with a positive adj (importance).
* Since only tasks with a positive adj can be targeted, that
* naturally excludes tasks which shouldn't be killed, like init
* and kthreads. Although oom_score_adj can still be changed
* while this code runs, it doesn't really matter; we just need
* a snapshot of the task's adj.
*/
sig = tsk->signal;
adj = READ_ONCE(sig->oom_score_adj);
if (adj < 0 ||
sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP) ||
(thread_group_empty(tsk) && tsk->flags & PF_EXITING))
continue;
/* Store the task in a linked-list bucket based on its adj */
tsk->simple_lmk_next = task_bucket[adj];
task_bucket[adj] = tsk;
/* Track the min and max adjs to speed up the loop below */
if (adj > max_adj)
max_adj = adj;
if (adj < min_adj)
min_adj = adj;
}
/* Start searching for victims from the highest adj (least important) */
for (i = max_adj; i >= min_adj; i--) {
int old_vindex;
tsk = task_bucket[i];
if (!tsk)
continue;
/* Clear out this bucket for the next time reclaim is done */
task_bucket[i] = NULL;
/* Iterate through every task with this adj */
old_vindex = *vindex;
do {
struct task_struct *vtsk;
vtsk = find_lock_task_mm(tsk);
if (!vtsk)
continue;
/* Store this potential victim away for later */
victims[*vindex].tsk = vtsk;
victims[*vindex].mm = vtsk->mm;
victims[*vindex].size = get_total_mm_pages(vtsk->mm);
/* Count the number of pages that have been found */
pages_found += victims[*vindex].size;
/* Make sure there's space left in the victim array */
if (++*vindex == MAX_VICTIMS)
break;
} while ((tsk = tsk->simple_lmk_next));
/* Go to the next bucket if nothing was found */
if (*vindex == old_vindex)
continue;
/*
* Sort the victims in descending order of size to prioritize
* killing the larger ones first.
*/
sort(&victims[old_vindex], *vindex - old_vindex,
sizeof(*victims), victim_cmp, victim_swap);
/* Stop when we are out of space or have enough pages found */
if (*vindex == MAX_VICTIMS || pages_found >= MIN_FREE_PAGES) {
/* Zero out any remaining buckets we didn't touch */
if (i > min_adj)
memset(&task_bucket[min_adj], 0,
(i - min_adj) * sizeof(*task_bucket));
break;
}
}
rcu_read_unlock();
return pages_found;
}
static int process_victims(int vlen)
{
unsigned long pages_found = 0;
int i, nr_to_kill = 0;
/*
* Calculate the number of tasks that need to be killed and quickly
* release the references to those that'll live.
*/
for (i = 0; i < vlen; i++) {
struct victim_info *victim = &victims[i];
struct task_struct *vtsk = victim->tsk;
/* The victim's mm lock is taken in find_victims; release it */
if (pages_found >= MIN_FREE_PAGES) {
task_unlock(vtsk);
} else {
pages_found += victim->size;
nr_to_kill++;
}
}
return nr_to_kill;
}
static void set_task_rt_prio(struct task_struct *tsk, int priority)
{
const struct sched_param rt_prio = {
.sched_priority = priority
};
sched_setscheduler_nocheck(tsk, SCHED_RR, &rt_prio);
}
static void scan_and_kill(void)
{
int i, nr_to_kill, nr_found = 0;
unsigned long pages_found;
/*
* Reset nr_victims so the reaper thread and simple_lmk_mm_freed() are
* aware that the victims array is no longer valid.
*/
write_lock(&mm_free_lock);
nr_victims = 0;
write_unlock(&mm_free_lock);
/* Populate the victims array with tasks sorted by adj and then size */
pages_found = find_victims(&nr_found);
if (unlikely(!nr_found)) {
pr_err_ratelimited("No processes available to kill!\n");
return;
}
/* Minimize the number of victims if we found more pages than needed */
if (pages_found > MIN_FREE_PAGES) {
/* First round of processing to weed out unneeded victims */
nr_to_kill = process_victims(nr_found);
/*
* Try to kill as few of the chosen victims as possible by
* sorting the chosen victims by size, which means larger
* victims that have a lower adj can be killed in place of
* smaller victims with a high adj.
*/
sort(victims, nr_to_kill, sizeof(*victims), victim_cmp,
victim_swap);
/* Second round of processing to finally select the victims */
nr_to_kill = process_victims(nr_to_kill);
} else {
/* Too few pages found, so all the victims need to be killed */
nr_to_kill = nr_found;
}
/*
* Store the final number of victims for simple_lmk_mm_freed() and the
* reaper thread, and indicate that reclaim is active.
*/
write_lock(&mm_free_lock);
nr_victims = nr_to_kill;
reclaim_active = true;
write_unlock(&mm_free_lock);
/* Kill the victims */
for (i = 0; i < nr_to_kill; i++) {
struct victim_info *victim = &victims[i];
struct task_struct *t, *vtsk = victim->tsk;
struct mm_struct *mm = victim->mm;
pr_info("Killing %s with adj %d to free %lu KiB\n", vtsk->comm,
vtsk->signal->oom_score_adj,
victim->size << (PAGE_SHIFT - 10));
/* Make the victim reap anonymous memory first in exit_mmap() */
set_bit(MMF_OOM_VICTIM, &mm->flags);
/* Accelerate the victim's death by forcing the kill signal */
do_send_sig_info(SIGKILL, SEND_SIG_FORCED, vtsk, true);
/*
* Mark the thread group dead so that other kernel code knows,
* and then elevate the thread group to SCHED_RR with minimum RT
* priority. The entire group needs to be elevated because
* there's no telling which threads have references to the mm as
* well as which thread will happen to put the final reference
* and release the mm's memory. If the mm is released from a
* thread with low scheduling priority then it may take a very
* long time for exit_mmap() to complete.
*/
rcu_read_lock();
for_each_thread(vtsk, t)
set_tsk_thread_flag(t, TIF_MEMDIE);
for_each_thread(vtsk, t)
set_task_rt_prio(t, 1);
rcu_read_unlock();
/* Allow the victim to run on any CPU. This won't schedule. */
set_cpus_allowed_ptr(vtsk, cpu_all_mask);
/* Signals can't wake frozen tasks; only a thaw operation can */
__thaw_task(vtsk);
/* Store the number of anon pages to sort victims for reaping */
victim->size = get_mm_counter(mm, MM_ANONPAGES);
/* Finally release the victim's task lock acquired earlier */
task_unlock(vtsk);
}
/*
* Sort the victims by descending order of anonymous pages so the reaper
* thread can prioritize reaping the victims with the most anonymous
* pages first. Then wake the reaper thread if it's asleep. The lock
* orders the needs_reap store before waitqueue_active().
*/
write_lock(&mm_free_lock);
sort(victims, nr_to_kill, sizeof(*victims), victim_cmp, victim_swap);
atomic_set(&needs_reap, 1);
write_unlock(&mm_free_lock);
if (waitqueue_active(&reaper_waitq))
wake_up(&reaper_waitq);
/* Wait until all the victims die or until the timeout is reached */
if (!wait_for_completion_timeout(&reclaim_done, RECLAIM_EXPIRES))
pr_info("Timeout hit waiting for victims to die, proceeding\n");
/* Clean up for future reclaims but let the reaper thread keep going */
write_lock(&mm_free_lock);
reinit_completion(&reclaim_done);
reclaim_active = false;
nr_killed = (atomic_t)ATOMIC_INIT(0);
write_unlock(&mm_free_lock);
}
static int simple_lmk_reclaim_thread(void *data)
{
/* Use maximum RT priority */
set_task_rt_prio(current, MAX_RT_PRIO - 1);
set_freezable();
while (1) {
wait_event_freezable(oom_waitq, atomic_read(&needs_reclaim));
scan_and_kill();
atomic_set(&needs_reclaim, 0);
}
return 0;
}
static struct mm_struct *next_reap_victim(void)
{
struct mm_struct *mm = NULL;
bool should_retry = false;
int i;
/* Take a write lock so no victim's mm can be freed while scanning */
write_lock(&mm_free_lock);
for (i = 0; i < nr_victims; i++, mm = NULL) {
/* Check if this victim is alive and hasn't been reaped yet */
mm = victims[i].mm;
if (!mm || test_bit(MMF_OOM_SKIP, &mm->flags))
continue;
/* Do a trylock so the reaper thread doesn't sleep */
if (!down_read_trylock(&mm->mmap_sem)) {
should_retry = true;
continue;
}
/* Skip any mm with notifiers for now since they can sleep */
if (mm_has_notifiers(mm)) {
up_read(&mm->mmap_sem);
should_retry = true;
continue;
}
/*
* Check MMF_OOM_SKIP again under the lock in case this mm was
* reaped by exit_mmap() and then had its page tables destroyed.
* No mmgrab() is needed because the reclaim thread sets
* MMF_OOM_VICTIM under task_lock() for the mm's task, which
* guarantees that MMF_OOM_VICTIM is always set before the
* victim mm can enter exit_mmap(). Therefore, an mmap read lock
* is sufficient to keep the mm struct itself from being freed.
*/
if (!test_bit(MMF_OOM_SKIP, &mm->flags))
break;
up_read(&mm->mmap_sem);
}
if (!mm) {
if (should_retry)
/* Return ERR_PTR(-EAGAIN) to try reaping again later */
mm = ERR_PTR(-EAGAIN);
else if (!reclaim_active)
/*
* Nothing left to reap, so stop simple_lmk_mm_freed()
* from iterating over the victims array since reclaim
* is no longer active. Return NULL to stop reaping.
*/
nr_victims = 0;
}
write_unlock(&mm_free_lock);
return mm;
}
static void reap_victims(void)
{
struct mm_struct *mm;
while ((mm = next_reap_victim())) {
if (IS_ERR(mm)) {
/* Wait one jiffy before trying to reap again */
schedule_timeout_uninterruptible(1);
continue;
}
/*
* Reap the victim, then unflag the mm for exit_mmap() reaping
* and mark it as reaped with MMF_OOM_SKIP.
*/
__oom_reap_task_mm(mm);
clear_bit(MMF_OOM_VICTIM, &mm->flags);
set_bit(MMF_OOM_SKIP, &mm->flags);
up_read(&mm->mmap_sem);
}
}
static int simple_lmk_reaper_thread(void *data)
{
/* Use a lower priority than the reclaim thread */
set_task_rt_prio(current, MAX_RT_PRIO - 2);
set_freezable();
while (1) {
wait_event_freezable(reaper_waitq,
atomic_cmpxchg_relaxed(&needs_reap, 1, 0));
reap_victims();
}
return 0;
}
void simple_lmk_mm_freed(struct mm_struct *mm)
{
int i;
/*
* Victims are guaranteed to have MMF_OOM_SKIP set after exit_mmap()
* finishes. Use this to ignore unrelated dying processes.
*/
if (!test_bit(MMF_OOM_SKIP, &mm->flags))
return;
read_lock(&mm_free_lock);
for (i = 0; i < nr_victims; i++) {
if (victims[i].mm == mm) {
/*
* Clear out this victim from the victims array and only
* increment nr_killed if reclaim is active. If reclaim
* isn't active, then clearing out the victim is done
* solely for the reaper thread to avoid freed victims.
*/
victims[i].mm = NULL;
if (reclaim_active &&
atomic_inc_return_relaxed(&nr_killed) == nr_victims)
complete(&reclaim_done);
break;
}
}
read_unlock(&mm_free_lock);
}
static int simple_lmk_vmpressure_cb(struct notifier_block *nb,
unsigned long pressure, void *data)
{
if (pressure == 100) {
atomic_set(&needs_reclaim, 1);
smp_mb__after_atomic();
if (waitqueue_active(&oom_waitq))
wake_up(&oom_waitq);
}
return NOTIFY_OK;
}
static struct notifier_block vmpressure_notif = {
.notifier_call = simple_lmk_vmpressure_cb,
.priority = INT_MAX
};
/* Initialize Simple LMK when lmkd in Android writes to the minfree parameter */
static int simple_lmk_init_set(const char *val, const struct kernel_param *kp)
{
static atomic_t init_done = ATOMIC_INIT(0);
struct task_struct *thread;
if (!atomic_cmpxchg(&init_done, 0, 1)) {
thread = kthread_run(simple_lmk_reaper_thread, NULL,
"simple_lmkd_reaper");
BUG_ON(IS_ERR(thread));
thread = kthread_run(simple_lmk_reclaim_thread, NULL,
"simple_lmkd");
BUG_ON(IS_ERR(thread));
BUG_ON(vmpressure_notifier_register(&vmpressure_notif));
}
return 0;
}
static const struct kernel_param_ops simple_lmk_init_ops = {
.set = simple_lmk_init_set
};
/* Needed to prevent Android from thinking there's no LMK and thus rebooting */
#undef MODULE_PARAM_PREFIX
#define MODULE_PARAM_PREFIX "lowmemorykiller."
module_param_cb(minfree, &simple_lmk_init_ops, NULL, 0200);

View File

@@ -144,7 +144,7 @@ static const struct pci_device_id sil_pci_tbl[] = {
static const struct sil_drivelist {
const char *product;
unsigned int quirk;
} sil_blacklist [] = {
} sil_quirks[] = {
{ "ST320012AS", SIL_QUIRK_MOD15WRITE },
{ "ST330013AS", SIL_QUIRK_MOD15WRITE },
{ "ST340017AS", SIL_QUIRK_MOD15WRITE },
@@ -617,8 +617,8 @@ static void sil_thaw(struct ata_port *ap)
* list, and apply the fixups to only the specific
* devices/hosts/firmwares that need it.
*
* 20040111 - Seagate drives affected by the Mod15Write bug are blacklisted
* The Maxtor quirk is in the blacklist, but I'm keeping the original
* 20040111 - Seagate drives affected by the Mod15Write bug are quirked
* The Maxtor quirk is in sil_quirks, but I'm keeping the original
* pessimistic fix for the following reasons...
* - There seems to be less info on it, only one device gleaned off the
* Windows driver, maybe only one is affected. More info would be greatly
@@ -637,9 +637,9 @@ static void sil_dev_config(struct ata_device *dev)
ata_id_c_string(dev->id, model_num, ATA_ID_PROD, sizeof(model_num));
for (n = 0; sil_blacklist[n].product; n++)
if (!strcmp(sil_blacklist[n].product, model_num)) {
quirks = sil_blacklist[n].quirk;
for (n = 0; sil_quirks[n].product; n++)
if (!strcmp(sil_quirks[n].product, model_num)) {
quirks = sil_quirks[n].quirk;
break;
}

View File

@@ -105,7 +105,8 @@ static ssize_t bus_attr_show(struct kobject *kobj, struct attribute *attr,
{
struct bus_attribute *bus_attr = to_bus_attr(attr);
struct subsys_private *subsys_priv = to_subsys_private(kobj);
ssize_t ret = 0;
/* return -EIO for reading a bus attribute without show() */
ssize_t ret = -EIO;
if (bus_attr->show)
ret = bus_attr->show(subsys_priv->bus, buf);
@@ -117,7 +118,8 @@ static ssize_t bus_attr_store(struct kobject *kobj, struct attribute *attr,
{
struct bus_attribute *bus_attr = to_bus_attr(attr);
struct subsys_private *subsys_priv = to_subsys_private(kobj);
ssize_t ret = 0;
/* return -EIO for writing a bus attribute without store() */
ssize_t ret = -EIO;
if (bus_attr->store)
ret = bus_attr->store(subsys_priv->bus, buf, count);

View File

@@ -989,8 +989,11 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
if (!env)
return -ENOMEM;
/* Synchronize with really_probe() */
device_lock(dev);
/* let the kset specific function add its keys */
retval = kset->uevent_ops->uevent(kset, &dev->kobj, env);
device_unlock(dev);
if (retval)
goto out;

View File

@@ -3534,10 +3534,12 @@ void drbd_uuid_new_current(struct drbd_device *device) __must_hold(local)
void drbd_uuid_set_bm(struct drbd_device *device, u64 val) __must_hold(local)
{
unsigned long flags;
if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0)
return;
spin_lock_irqsave(&device->ldev->md.uuid_lock, flags);
if (device->ldev->md.uuid[UI_BITMAP] == 0 && val == 0) {
spin_unlock_irqrestore(&device->ldev->md.uuid_lock, flags);
return;
}
if (val == 0) {
drbd_uuid_move_history(device);
device->ldev->md.uuid[UI_HISTORY_START] = device->ldev->md.uuid[UI_BITMAP];

View File

@@ -888,7 +888,7 @@ is_valid_state(struct drbd_device *device, union drbd_state ns)
ns.disk == D_OUTDATED)
rv = SS_CONNECTED_OUTDATES;
else if ((ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
else if (nc && (ns.conn == C_VERIFY_S || ns.conn == C_VERIFY_T) &&
(nc->verify_alg[0] == 0))
rv = SS_NO_VERIFY_ALG;

View File

@@ -1844,7 +1844,8 @@ static int loop_add(struct loop_device **l, int i)
lo->tag_set.queue_depth = 128;
lo->tag_set.numa_node = NUMA_NO_NODE;
lo->tag_set.cmd_size = sizeof(struct loop_cmd);
lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
lo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE |
BLK_MQ_F_NO_SCHED;
lo->tag_set.driver_data = lo;
err = blk_mq_alloc_tag_set(&lo->tag_set);

View File

@@ -2,7 +2,7 @@
config ZRAM
tristate "Compressed RAM block device support"
depends on BLOCK && SYSFS && ZSMALLOC && CRYPTO
select CRYPTO_LZO
select CRYPTO_LZ4
default n
help
Creates virtual block devices called /dev/zramX (X = 0, 1, ...).

Some files were not shown because too many files have changed in this diff Show More