kernel_realme_salaa

Author	SHA1	Message	Date
Fangrui Song	dabb099b50	BACKPORT: x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem_64.S Commit `393f203f5f` ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") added .weak directives to arch/x86/lib/mem_64.S instead of changing the existing ENTRY macros to WEAK. This can lead to the assembly snippet .weak memcpy ... .globl memcpy which will produce a STB_WEAK memcpy with GNU as but STB_GLOBAL memcpy with LLVM's integrated assembler before LLVM 12. LLVM 12 (since https://reviews.llvm.org/D90108) will error on such an overridden symbol binding. Commit ef1e03152cb0 ("x86/asm: Make some functions local") changed ENTRY in arch/x86/lib/memcpy_64.S to SYM_FUNC_START_LOCAL, which was ineffective due to the preceding .weak directive. Use the appropriate SYM_FUNC_START_WEAK instead. Fixes: `393f203f5f` ("x86_64: kasan: add interceptors for memset/memmove/memcpy functions") Fixes: ef1e03152cb0 ("x86/asm: Make some functions local") Reported-by: Sami Tolvanen <samitolvanen@google.com> Change-Id: I77420199abb62cacbed4de8d3b244f77e43a7f38 Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20201103012358.168682-1-maskray@google.com Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>	2026-01-20 13:45:20 +00:00
Al Viro	ffa7603c0a	UPSTREAM: define __poll_t, annotate constants Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>	2026-01-20 13:45:15 +00:00
Ansh	0f730d34f4	arch: arm64: configs: Enable bp3a required configs Signed-off-by: Ansh <singhansh64321@gmail.com>	2026-01-18 10:58:07 +00:00
StimLuks87	2451f4b356	salaa_defconfig: Enable CONFIG_POWERSUSPEND Signed-off-by: StimLuks87 <153687700+StimLuks87@users.noreply.github.com>	2026-01-18 10:58:07 +00:00
StimLuks87	97d31842b9	arch/arm64: dts/mediatek: Fix mt6785.dts. Signed-off-by: StimLuks87 <153687700+StimLuks87@users.noreply.github.com>	2026-01-18 10:58:06 +00:00
Randy Dunlap	e57999dd77	xattr: fix kernel-doc for mnt_userns and vfs xattr helpers Fix kernel-doc warnings in xattr.c: ../fs/xattr.c:257: warning: Function parameter or member 'mnt_userns' not described in '__vfs_setxattr_locked' ../fs/xattr.c:485: warning: Function parameter or member 'mnt_userns' not described in '__vfs_removexattr_locked' and fix one function whose kernel-doc was not in the correct format. Link: https://lore.kernel.org/r/20210216042929.8931-4-rdunlap@infradead.org Fixes: 71bc356f93a1 ("commoncap: handle idmapped mounts") Fixes: `b1ab7e4b2a` ("VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: David P. Quigley <dpquigl@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2026-01-18 10:57:58 +00:00
Heiko Carstens	47ea1cf491	BACKPORT: epoll: fix compat syscall wire up of epoll_pwait2 Commit b0a0c2615f6f ("epoll: wire up syscall epoll_pwait2") wired up the 64 bit syscall instead of the compat variant in a couple of places. Fixes: b0a0c2615f6f ("epoll: wire up syscall epoll_pwait2") Change-Id: Ida2267e953240c036d64bc61177ee72bb20f35a3 Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-18 10:57:51 +00:00
Willem de Bruijn	1e308c53b9	BACKPORT: epoll: wire up syscall epoll_pwait2 Split off from prev patch in the series that implements the syscall. Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com Change-Id: I48dfae6f721b24ebc53de603e393289954a95908 Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-18 10:57:51 +00:00
Christian Brauner	21fa9d8148	BACKPORT: arch: wire-up close_range() This wires up the close_range() syscall into all arches at once. Suggested-by: Arnd Bergmann <arnd@arndb.de> Change-Id: Ib962b01f3a490c901b0e0f51e436df1a26887ca4 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Jann Horn <jannh@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Dmitry V. Levin <ldv@altlinux.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Florian Weimer <fweimer@redhat.com> Cc: linux-api@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-arch@vger.kernel.org Cc: x86@kernel.org	2026-01-18 10:57:50 +00:00
StimLuks87	c6854d5892	kernel: Improve performance by targeting Cortex-A76 and Cortex-A55 cores specifically when building with Clang. Signed-off-by: StimLuks87 <san10031987san@gmail.com>	2026-01-18 10:57:43 +00:00
StimLuks87	0e9b915dfd	salaa_defconfig: Disable F2FS_FS_COMPRESSION and OPLUS_FEATURE_EROFS	2026-01-18 10:57:42 +00:00
StimLuks87	b417913b3f	salaa_defconfig: Enable 300 HZ	2026-01-18 10:57:42 +00:00
Sultan Alsawaf	0051e63fa6	cpu: Silence log spam when a CPU is brought up Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Forenche <prahul2003@gmail.com> Signed-off-by: LinkBoi00 <linkdevel@protonmail.com> Signed-off-by: prathamdby <134331217+prathamdby@users.noreply.github.com>	2026-01-18 10:57:41 +00:00
Sultan Alsawaf	872423fc61	treewide: Suppress overly verbose log spam This tames quite a bit of the log spam and makes dmesg readable. Uses work from Danny Lin <danny@kdrag0n.dev>. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>	2026-01-18 10:57:26 +00:00
kdrag0n	fe3a58c95f	arm64: debug: Disable self-hosted debug by default Signed-off-by: kdrag0n <dragon@khronodragon.com> Signed-off-by: celtare21 <celtare21@gmail.com> Signed-off-by: Sarthak Roy <sarthak.roy@s.amity.edu> Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>	2026-01-18 10:57:24 +00:00
Soham Sen	617f58d6fa	treewide: Enable -03 optimization Signed-off-by: officialputuid <officialputuid@hack.id>	2026-01-18 10:57:18 +00:00
Adithya R	2274d80c0f	kbuild: Add support for LLVM's Polly optimizer This adds support for compiling the kernel with optimizations offered by LLVM's polyhedral loop optimizer known as Polly, which can improve performance by improving cache locality in loops. Note that LLVM is not compiled with Polly by default -- it must be enabled explicitly. [ghostrider-reborn] - Removed polly DCE as it's no longer supported Signed-off-by: officialputuid <officialputuid@hack.id>	2026-01-18 10:57:17 +00:00
StimLuks87	e58f4ff606	arch: arm64: include crypto sha512	2026-01-18 10:57:13 +00:00
StimLuks87	8cdd8abbbe	ARM64: Match filesystem features between all platforms * Enable EXT4/F2FS/Checkpoint related features on all platforms, end this discrimination! * Let's also stop selecting SDCARD_FS by default, if someone really wants it, they can enable it in their own defconfig.	2026-01-18 10:57:13 +00:00
bengris32	953cf06c57	ARM64: Implement dtbo.img building via CONFIG_BUILD_ARM64_DTB_OVERLAY_IMAGE * This commit is a combination of multiple commits, namely: - dtbo.img: build device tree overlay partition image (Woody Lin <woodylin@google.com>) - arm64: Add kernel level DTBO building logic (Ash Blake <ash.blake@tutanota.com>) * Two new config options are added, CONFIG_BUILD_ARM64_DTB_OVERLAY_IMAGE and CONFIG_BUILD_ARM64_DTB_OVERLAY_IMAGE_NAMES. The kernel config will specify a comma-seperated list of dtbs to include in the built dtbo.img, but the difference is that the dtbo.img packing is now integrated into the kernel build process instead of being performed by the Android build system. All dtbs specified in CONFIG_BUILD_ARM64_DTB_OVERLAY_IMAGE_NAMES will be packed into the resulting dtbo.img, and will always be added to targets when CONFIG_BUILD_ARM64_DTB_OVERLAY_IMAGE is enabled. Change-Id: I67d5dc1ce8c0d3c74bf4c7d9a4a9a464c4839999 Co-authored-by: Woody Lin <woodylin@google.com> Co-authored-by: Ash Blake <ash.blake@tutanota.com> Signed-off-by: bengris32 <bengris32@protonmail.ch>	2026-01-18 10:57:12 +00:00
StimLuks87	1a561ea028	dts: fix warnings clang building dtbo	2026-01-18 10:57:12 +00:00
Stim	3c9001754a	arm64: dts: fix kpoc shutdown issue	2026-01-18 10:57:12 +00:00
Peter Xu	1d8476cc67	BACKPORT: UPSTREAM: mm: allow VM_FAULT_RETRY for multiple times The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 4064b982706375025628094e51d11cf1a958a5d3) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: If0378d8ccbfc54a574b91103a6dc76e446f5f12e	2026-01-18 10:56:46 +00:00
Peter Xu	dad9495a62	UPSTREAM: mm: introduce FAULT_FLAG_DEFAULT Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit dde1607248328cdb7570e3a252e8fb76b3411d66) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I53fb69d7974571ed827cc7cc2a54af555c80643f	2026-01-18 10:56:46 +00:00
Peter Xu	e1d5350664	UPSTREAM: sh/mm: use helper fault_signal_pending() Let SH to use the new fault_signal_pending() helper. Here we'll need to move the up_read() out because that's actually needed as long as !RETRY cases. At the meantime we can drop all the rest of up_read()s now (which seems to be cleaner). Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160226.9550-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fb027ada051a9e2d70a069b2aa62fb6f52100bbf) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Icd6511efc716bbc9cb78c6ff9c99e9dbe19d72d5	2026-01-18 10:56:45 +00:00
Peter Xu	b57836a9ed	UPSTREAM: powerpc/mm: use helper fault_signal_pending() Let powerpc code to use the new helper, by moving the signal handling earlier before the retry logic. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160222.9422-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c9a0dad162014182867f81b28bb7a4b691d65595) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Ia4818fd5c35e85993ff87b1326a133334380868b	2026-01-18 10:56:45 +00:00
Peter Xu	95ed2f42f7	UPSTREAM: arm64/mm: use helper fault_signal_pending() Let the arm64 fault handling to use the new fault_signal_pending() helper, by moving the signal handling out of the retry logic. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155927.9264-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b502f038f2ffc97a60fefcc120a868aa46009060) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I0168a43369eed5d3880a45d6784bae86ed5213be	2026-01-18 10:56:44 +00:00
Peter Xu	e874ac7df6	UPSTREAM: arc/mm: use helper fault_signal_pending() Let ARC to use the new helper fault_signal_pending() by moving the signal check out of the retry logic as standalone. This should also helps to simplify the code a bit. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155843.9172-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 24a62cf41f670fcba90dfba4db2a59a22cc830d5) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I8a014591633edc99715f73d6fba908c5d9ceeef3	2026-01-18 10:56:44 +00:00
Peter Xu	67770d231e	UPSTREAM: x86/mm: use helper fault_signal_pending() Let's move the fatal signal check even earlier so that we can directly use the new fault_signal_pending() in x86 mm code. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-5-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 39678191cd8988c811813baf4c97b43bf46094e4) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Icaf666ec5f9f83043396ae0c5b6a079e9c2349bc	2026-01-18 10:56:43 +00:00
Peter Xu	8090174133	BACKPORT: mm: introduce fault_signal_pending() For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path. It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs. Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs. Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches. [peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 4ef873226ceb9c7bf11a922caddc5698a24bcfaf) [Kalesh Singh: Resolve #include conflict in include/linux/sched/signal.h] Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Ie9080d75c2cc22b1fd4b50755e9e61ad0ab677c2	2026-01-18 10:56:42 +00:00
Vineet Gupta	c3e3980af3	ARC: mm: do_page_fault refactor #8 : release mmap_sem sooner In case of successful page fault handling, this patch releases mmap_sem before updating the perf stat event for major/minor faults. So even though the contention reduction is NOT super high, it is still an improvement. There's an additional code size improvement as we only have 2 up_read() calls now. Note to myself: -------------- 1. Given the way it is done, we are forced to move @bad_area label earlier causing the various "goto bad_area" cases to hit perf stat code. - PERF_COUNT_SW_PAGE_FAULTS is NOW updated for access errors which is what arm/arm64 seem to be doing as well (with slightly different code) - PERF_COUNT_SW_PAGE_FAULTS_{MAJ,MIN} must NOT be updated for the error case which is guarded by now setting @fault initial value to VM_FAULT_ERROR which serves both cases when handle_mm_fault() returns error or is not called at all. 2. arm/arm64 use two homebrew fault flags VM_FAULT_BAD{MAP,MAPACCESS} which I was inclined to add too but seems not needed for ARC - given that we have everything is 1 function we can still use goto - we setup si_code at the right place (arm* do that in the end) - we init fault already to error value which guards entry into perf stats event update Cc: Peter Zijlstra <peterz@infradead.org> Change-Id: I35f2d8c2b5f16fdf14213c1167cec61e21938134 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:42 +00:00
Vineet Gupta	105a6d08e7	ARC: mm: do_page_fault refactor #7 : fold the various error handling - single up_read() call vs. 4 - so much easier on eyes Technically it seems like @bad_area label moved up, but even in old regime, it was a special case of delivering SIGSEGV unconditionally which we now do as well, although with checks. Also note that @fault needs to be initialized since we can land in @bad_area (which reads it) without setting it up with return value of handle_mm_fault() - failing to do so did bite us although as a side effect of different patch: see [1] [1]: http://lists.infradead.org/pipermail/linux-snps-arc/2019-May/005803.html Change-Id: I0964d9868efc0add93878ee57a54631604fcd42c Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:42 +00:00
Vineet Gupta	131a10c35a	ARC: mm: do_page_fault refactor #6 : error handlers to use same pattern - up_read - if !user_mode - whatever error handling Change-Id: I7056cdc08fe6e8106e4e5dfee9fdc98d6fa10b81 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:42 +00:00
Vineet Gupta	f54c17fc82	ARC: mm: do_page_fault refactor #5 : scoot no_context to end This is different than the rest of signal handling stuff No functional change Change-Id: I86a0d68e763ca8d1131021201a21a07e5d47a6bb Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:41 +00:00
Vineet Gupta	a897e037c8	ARC: mm: do_page_fault refactor #4 : consolidate retry related logic stats update code can now elide "retry" check and additional level of indentation since all retry handling is done ahead of it already Change-Id: If1816cf5b4a522774d67aabcd02188e1c98f0601 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:41 +00:00
Vineet Gupta	e38698a3bd	ARC: mm: do_page_fault refactor #3 : tidyup vma access permission code The coding pattern to NOT intialize variables at declaration time but rather near code which makes us eof them makes it much easier to grok the overall logic, specially when the init is not simply 0 or 1 Change-Id: I6d2d00d0ae258bcdee115b9a6bd7dec22e551a00 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:41 +00:00
Vineet Gupta	0632a6633c	ARC: mm: do_page_fault refactor #2 : remove short lived variable Compiler will do this anyways, still.. No functional change. Change-Id: Ie48e7058203b83b18fda78962bf0afdffb9fffbf Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:41 +00:00
Vineet Gupta	61ad80a92b	ARC: mm: do_page_fault refactor #1 : remove label @good_area Invert the condition for stack expansion. No functional change Change-Id: Ia502955f3d0a680dde1105559f38abd0d2d0db24 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:40 +00:00
Eugeniy Paltsev	9cca5500e3	ARC: mm: SIGSEGV userspace trying to access kernel virtual memory As of today if userspace process tries to access a kernel virtual addres (0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already exists, that process hangs instead of being killed with SIGSEGV Fix that by ensuring that do_page_fault() handles kenrel vaddr only if in kernel mode. And given this, we can also simplify the code a bit. Now a vmalloc fault implies kernel mode so its failure (for some reason) can reuse the @no_context label and we can remove @bad_area_nosemaphore. Reproduce user test for original problem: ------------------------>8----------------- #include <stdlib.h> #include <stdint.h> int main(int argc, char argv[]) { volatile uint32_t temp; temp = (uint32_t *)(0x70000000); } ------------------------>8----------------- Cc: <stable@vger.kernel.org> Change-Id: I871721a1a4edcdad0384e24f75ec67141c6f2ddc Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:40 +00:00
Vineet Gupta	8bca920094	ARC: mm: do_page_fault fixes #1 : relinquish mmap_sem if signal arrives while handle_mm_fault do_page_fault() forgot to relinquish mmap_sem if a signal came while handling handle_mm_fault() - due to say a ctl+c or oom etc. This would later cause a deadlock by acquiring it twice. This came to light when running libc testsuite tst-tls3-malloc test but is likely also the cause for prior seen LTP failures. Using lockdep clearly showed what the issue was. \| # while true; do ./tst-tls3-malloc ; done \| Didn't expect signal from child: got `Segmentation fault' \| ^C \| ============================================ \| WARNING: possible recursive locking detected \| 4.17.0+ #25 Not tainted \| -------------------------------------------- \| tst-tls3-malloc/510 is trying to acquire lock: \| 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c \| \|but task is already holding lock: \|606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0 \| \| other info that might help us debug this: \| Possible unsafe locking scenario: \| \| CPU0 \| ---- \| lock(&mm->mmap_sem); \| lock(&mm->mmap_sem); \| \| * DEADLOCK * \| ------------------------------------------------------------ What the change does is not obvious (note to myself) prior code was \| do_page_fault \| \| down_read() <-- lock taken \| handle_mm_fault <-- signal pending as this runs \| if fatal_signal_pending \| if VM_FAULT_ERROR \| up_read \| if user_mode \| return <-- lock still held, this was the BUG New code \| do_page_fault \| \| down_read() <-- lock taken \| handle_mm_fault <-- signal pending as this runs \| if fatal_signal_pending \| if VM_FAULT_RETRY \| return <-- not same case as above, but still OK since \| core mm already relinq lock for FAULT_RETRY \| ... \| \| < Now falls through for bug case above > \| \| up_read() <-- lock relinquished Cc: stable@vger.kernel.org Change-Id: I4d8c5ad338f86f349b3194c4c077ff03b45ae350 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2026-01-18 10:56:40 +00:00
Davidlohr Bueso	135a220457	arch/arc/mm/fault.c: remove caller signal_pending_branch predictions This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net Change-Id: Id70c1f130d2e66a8edb1ace7b6896a758eab7318 Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-18 10:56:40 +00:00
Axel Rasmussen	07a6044a23	BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will also get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will see a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388132/ Conflicts: arch/x86/Kconfig fs/userfaultfd.c include/linux/userfaultfd_k.h include/uapi/linux/userfaultfd.h init/Kconfig mm/hugetlb.c (Lack of userfaultfd write-protect support in 5.4 lead to all conflicts. Resolved by carefully rebasing such that write-protect related code doesn't get added) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I43b37272d531341439ceaa03213d0e2415e04688	2026-01-18 10:56:21 +00:00
Peter Xu	49e0db2269	BACKPORT: FROMGIT: hugetlb/userfaultfd: forbid huge pmd sharing when uffd enabled Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Link: https://lkml.kernel.org/r/20210218231202.15426-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit ab6a0d00a63f92f1f0d220274fa989eb75c09f2b https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1382207/ Conflicts: include/linux/hugetlb.h mm/hugetlb.c (Manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Ie2dff7ab31600cae78914e3278be61516844394e	2026-01-18 10:56:20 +00:00
Peter Xu	0d8c1affb2	BACKPORT: FROMGIT: hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4. This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. This patch (of 4): It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit b92dc1bfd52ecf338c024815a7c1d44e37a507a1 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1382205/ Conflicts: arch/sparc/mm/hugetlbpage.c mm/hugetlb.c mm/userfaultfd.c (1. manual rebase, 2. c0d0381ade79885c04a04c303284b040616b116e wasn't CP'ed. Rebased by appropriately updating huge_pte_alloc(), 3. manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I50db4e27f2951a5ee01b0dfa22c1ece34e79f881	2026-01-18 10:56:20 +00:00
Kalesh Singh	045e50b61f	UPSTREAM: x86: mremap speedup - Enable HAVE_MOVE_PUD HAVE_MOVE_PUD enables remapping pages at the PUD level if both the source and destination addresses are PUD-aligned. With HAVE_MOVE_PUD enabled it can be inferred that there is approximately a 13x improvement in performance on x86. (See data below). ------- Test Results --------- The following results were obtained using a 5.4 kernel, by remapping a PUD-aligned, 1GB sized region to a PUD-aligned destination. The results from 10 iterations of the test are given below: Total mremap times for 1GB data on x86. All times are in nanoseconds. Control HAVE_MOVE_PUD 180394 15089 235728 14056 238931 25741 187330 13838 241742 14187 177925 14778 182758 14728 160872 14418 205813 15107 245722 13998 205721.5 15594 <-- Mean time in nanoseconds A 1GB mremap completion time drops from ~205 microseconds to ~15 microseconds on x86. (~13x speed up). Link: https://lkml.kernel.org/r/20201014005320.2233162-6-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit be37c98d1134a8e068b52618c086dab6b34b9a2c) Bug: 151772539 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I7967951289885157ef3a487a6935abe3b860847f	2026-01-18 10:56:17 +00:00
Joel Fernandes (Google)	0770f42e07	mm: select HAVE_MOVE_PMD on x86 for faster mremap Moving page-tables at the PMD-level on x86 is known to be safe. Enable this option so that we can do fast mremap when possible. Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.com Change-Id: I05ffe342ac69c7f979debddc2e7aa66d90c11fc8 Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-18 10:56:17 +00:00
Kalesh Singh	2f1216cc5f	UPSTREAM: arm64: mremap speedup - enable HAVE_MOVE_PUD HAVE_MOVE_PUD enables remapping pages at the PUD level if both the source and destination addresses are PUD-aligned. With HAVE_MOVE_PUD enabled it can be inferred that there is approximately a 19x improvement in performance on arm64. (See data below). ------- Test Results --------- The following results were obtained using a 5.4 kernel, by remapping a PUD-aligned, 1GB sized region to a PUD-aligned destination. The results from 10 iterations of the test are given below: Total mremap times for 1GB data on arm64. All times are in nanoseconds. Control HAVE_MOVE_PUD 1247761 74271 1219896 46771 1094792 59687 1227760 48385 1043698 76666 1101771 50365 1159896 52500 1143594 75261 1025833 61354 1078125 48697 1134312.6 59395.7 <-- Mean time in nanoseconds A 1GB mremap completion time drops from ~1.1 milliseconds to ~59 microseconds on arm64. (~19x speed up). Link: https://lkml.kernel.org/r/20201014005320.2233162-5-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit f5308c896d5de211245a9dc73b4e530f75185dd5) Bug: 151772539 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I30590b7375dde84fe345f4920a06f6a2c0b5aa31	2026-01-18 10:56:17 +00:00
Kalesh Singh	a55d3384fb	BACKPORT: mm: speedup mremap on 1GB or larger regions Android needs to move large memory regions for garbage collection. The GC requires moving physical pages of multi-gigabyte heap using mremap. During this move, the application threads have to be paused for correctness. It is critical to keep this pause as short as possible to avoid jitters during user interaction. Optimize mremap for >= 1GB-sized regions by moving at the PUD/PGD level if the source and destination addresses are PUD-aligned. For CONFIG_PGTABLE_LEVELS == 3, moving at the PUD level in effect moves PGD entries, since the PUD entry is “folded back” onto the PGD entry. Add HAVE_MOVE_PUD so that architectures where moving at the PUD level isn't supported/tested can turn this off by not selecting the config. Link: https://lkml.kernel.org/r/20201014005320.2233162-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Gavin Shan <gshan@redhat.com> Cc: Hassan Naveed <hnaveed@wavecomp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Minchan Kim <minchan@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c49dd340180260c6239e453263a9a244da9a7c85) [Kalesh Singh: Resolve conflicts in mm/mremap.c] Bug: 151772539 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: Ia9b065f5059044815fd05f22bad33c484b2b2b73	2026-01-18 10:56:17 +00:00
Kalesh Singh	28f59e422e	UPSTREAM: arm64: mremap speedup - Enable HAVE_MOVE_PMD HAVE_MOVE_PMD enables remapping pages at the PMD level if both the source and destination addresses are PMD-aligned. HAVE_MOVE_PMD is already enabled on x86. The original patch [1] that introduced this config did not enable it on arm64 at the time because of performance issues with flushing the TLB on every PMD move. These issues have since been addressed in more recent releases with improvements to the arm64 TLB invalidation and core mmu_gather code as Will Deacon mentioned in [2]. >From the data below, it can be inferred that there is approximately 8x improvement in performance when HAVE_MOVE_PMD is enabled on arm64. --------- Test Results ---------- The following results were obtained on an arm64 device running a 5.4 kernel, by remapping a PMD-aligned, 1GB sized region to a PMD-aligned destination. The results from 10 iterations of the test are given below. All times are in nanoseconds. Control HAVE_MOVE_PMD 9220833 1247761 9002552 1219896 9254115 1094792 8725885 1227760 9308646 1043698 9001667 1101771 8793385 1159896 `8774636` 1143594 9553125 1025833 9374010 1078125 9100885.4 1134312.6 <-- Mean Time in nanoseconds Total mremap time for a 1GB sized PMD-aligned region drops from ~9.1 milliseconds to ~1.1 milliseconds. (~8x speedup). [1] https://lore.kernel.org/r/20181108181201.88826-3-joelaf@google.com [2] https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg140837.html Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20201014005320.2233162-3-kaleshsingh@google.com Link: https://lore.kernel.org/kvmarm/20181029102840.GC13965@arm.com/ Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 45544eee96065cf183fbb937fe1f45a172b06f4e) Bug: 151772539 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: If0d97276cf8de1a5893e97444f2d961db05abea5	2026-01-18 10:56:16 +00:00
Joel Fernandes (Google)	081c6c1044	mm: speed up mremap by 20x on large regions Android needs to mremap large regions of memory during memory management related operations. The mremap system call can be really slow if THP is not enabled. The bottleneck is move_page_tables, which is copying each pte at a time, and can be really slow across a large map. Turning on THP may not be a viable option, and is not for us. This patch speeds up the performance for non-THP system by copying at the PMD level when possible. The speedup is an order of magnitude on x86 (~20x). On a 1GB mremap, the mremap completion times drops from 3.4-3.6 milliseconds to 144-160 microseconds. Before: Total mremap time for 1GB data: 3521942 nanoseconds. Total mremap time for 1GB data: 3449229 nanoseconds. Total mremap time for 1GB data: 3488230 nanoseconds. After: Total mremap time for 1GB data: 150279 nanoseconds. Total mremap time for 1GB data: 144665 nanoseconds. Total mremap time for 1GB data: 158708 nanoseconds. If THP is enabled the optimization is mostly skipped except in certain situations. [joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning] Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.com Change-Id: I9d8e2e356a78f05119f48e37344a2df62cb8f97e Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-01-18 10:56:16 +00:00

1 2 3 4 5 ...

144587 Commits