kernel_google_redbull

Evolution-X-Devices/kernel_google_redbull

Author	SHA1	Message	Date
Matthew Wilcox (Oracle)	a10d5ec4e9	UPSTREAM: sysctl: Convert to iter interfaces Using the read_iter/write_iter interfaces allows for in-kernel users to set sysctls without using set_fs(). Also, the buffer is a string, so give it the real type of 'char ', not void . [AV: Christoph's fixup folded in] Change-Id: I9925d71d0c27b6712467085ec446ec7fca8e6db4 Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:28:02 +03:00
Matthew Wilcox (Oracle)	0a7ddc93e7	UPSTREAM: Call sysctl_head_finish on error This error path returned directly instead of calling sysctl_head_finish(). Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files") Change-Id: Id2ce64e2a64dbf0deb96dec5059a4c55337d309b Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:28:02 +03:00
Christoph Hellwig	dc0a07dd5a	UPSTREAM: sysctl: reject gigantic reads/write to sysctl files Instead of triggering a WARN_ON deep down in the page allocator just give up early on allocations that are way larger than the usual sysctl values. Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Change-Id: Ic951a7d2adbdc39c1b035fd72b93aac96e57fe25 Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:28:02 +03:00
Yonghong Song	065d288ad7	UPSTREAM: bpf: Refactor to provide aux info to bpf_iter_init_seq_priv_t This patch refactored target bpf_iter_init_seq_priv_t callback function to accept additional information. This will be needed in later patches for map element targets since a particular map should be passed to traverse elements for that particular map. In the future, other information may be passed to target as well, e.g., pid, cgroup id, etc. to customize the iterator. Change-Id: Ie4194ce07d7cd10e1da29186ff8c286d3030a4f3 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com	2025-09-08 17:27:27 +03:00
Christoph Hellwig	975824c4e3	BACKPORT: mm: remove the pgprot argument to __vmalloc The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Change-Id: Iae5854c7005dec82942db58215d615a10bde1f31 Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Wei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-09-08 17:27:09 +03:00
Yonghong Song	8afe815432	BACKPORT: net: bpf: Add netlink and ipv6_route bpf_iter targets This patch added netlink and ipv6_route targets, using the same seq_ops (except show() and minor changes for stop()) for /proc/net/{netlink,ipv6_route}. The net namespace for these targets are the current net namespace at file open stage, similar to /proc/net/{netlink,ipv6_route} reference counting the net namespace at seq_file open stage. Since module is not supported for now, ipv6_route is supported only if the IPV6 is built-in, i.e., not compiled as a module. The restriction can be lifted once module is properly supported for bpf_iter. Change-Id: Id907ac146e81197abd185e0d6b7431d176dca355 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175910.2476329-1-yhs@fb.com	2025-09-08 17:26:54 +03:00
Christoph Hellwig	c5095805e9	BACKPORT: sysctl: pass kernel pointers to ->proc_handler Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Change-Id: Ic71fd778e4cea58adc51d634d9e53c1f9f90cdf2 Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:26:43 +03:00
Casey Schaufler	15b47f86b6	UPSTREAM: procfs: add smack subdir to attrs Back in 2007 I made what turned out to be a rather serious mistake in the implementation of the Smack security module. The SELinux module used an interface in /proc to manipulate the security context on processes. Rather than use a similar interface, I used the same interface. The AppArmor team did likewise. Now /proc/.../attr/current will tell you the security "context" of the process, but it will be different depending on the security module you're using. This patch provides a subdirectory in /proc/.../attr for Smack. Smack user space can use the "current" file in this subdirectory and never have to worry about getting SELinux attributes by mistake. Programs that use the old interface will continue to work (or fail, as the case may be) as before. The proposed S.A.R.A security module is dependent on the mechanism to create its own attr subdirectory. The original implementation is by Kees Cook. Change-Id: I3355161a87dbeeb1786a67f638b84aa1d0fc96f1 Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org>	2025-09-08 17:25:28 +03:00
Tejun Heo	81ff164362	UPSTREAM: cgroup, blkcg: Prepare some symbols for module and !CONFIG_CGROUP usages btrfs is going to use css_put() and wbc helpers to improve cgroup writeback support. Add dummy css_get() definition and export wbc helpers to prepare for module and !CONFIG_CGROUP builds. Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Change-Id: I2fef472a0ce04d6bb32213500f57c72045b13633 Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-08 17:25:26 +03:00
Carlos Neira	25b806e2af	UPSTREAM: fs/nsfs.c: Added ns_match ns_match returns true if the namespace inode and dev_t matches the ones provided by the caller. Change-Id: Icadd597a774526fc1b358dab42ddace872bb1fe5 Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200304204157.58695-2-cneirabustos@gmail.com	2025-09-08 17:25:19 +03:00
Aleksa Sarai	78fb14a106	UPSTREAM: nsfs: clean-up ns_get_path() signature to return int ns_get_path() and ns_get_path_cb() only ever return either NULL or an ERR_PTR. It is far more idiomatic to simply return an integer, and it makes all of the callers of ns_get_path() more straightforward to read. Fixes: `e149ed2b80` ("take the targets of /proc//ns/ symlinks to separate fs") Change-Id: I08237e28535fe06776f8e1156605dc74c3c2bd77 Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:24:53 +03:00
Al Viro	ff9cef1e5f	UPSTREAM: nsfs: unobfuscate 1) IS_ERR(p) && PTR_ERR(p) == -E... is spelled p == ERR_PTR(-E...) 2) yes, you can open-code do-while and sometimes there's even a good reason to do so. Not in this case, though. Change-Id: I123f5ca3b759da5dfe5e0cffcf5ac0b02a736f45 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:24:53 +03:00
Tejun Heo	f5f21af1aa	BACKPORT: kernfs: convert kernfs_node->id from union kernfs_node_id to u64 kernfs_node->id is currently a union kernfs_node_id which represents either a 32bit (ino, gen) pair or u64 value. I can't see much value in the usage of the union - all that's needed is a 64bit ID which the current code is already limited to. Using a union makes the code unnecessarily complicated and prevents using 64bit ino without adding practical benefits. This patch drops union kernfs_node_id and makes kernfs_node->id a u64. ino is stored in the lower 32bits and gen upper. Accessors - kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the ino and gen. This simplifies ID handling less cumbersome and will allow using 64bit inos on supported archs. This patch doesn't make any functional changes. Change-Id: I289fc21fdfd22b7c7cae73626665b0cb100a0c5f Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexei Starovoitov <ast@kernel.org>	2025-09-08 17:24:46 +03:00
Al Viro	9f6a8afb37	UPSTREAM: new inode method: ->free_inode() A lot of ->destroy_inode() instances end with call_rcu() of a callback that does RCU-delayed part of freeing. Introduce a new method for doing just that, with saner signature. Rules: ->destroy_inode ->free_inode f g immediate call of f(), RCU-delayed call of g() f NULL immediate call of f(), no RCU-delayed calls NULL g RCU-delayed call of g() NULL NULL RCU-delayed default freeing IOW, NULL ->free_inode gives the same behaviour as now. Note that NULL, NULL is equivalent to NULL, free_inode_nonrcu; we could mandate the latter form, but that would have very little benefit beyond making rules a bit more symmetric. It would break backwards compatibility, require extra boilerplate and expected semantics for (NULL, NULL) pair would have no use whatsoever... Change-Id: I828dd519b381a1f25013ec3d1eae0b6835eb9c26 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2025-09-08 17:20:25 +03:00
Andrey Ignatov	b45b87b444	UPSTREAM: bpf: Add file_pos field to bpf_sysctl ctx Add file_pos field to bpf_sysctl context to read and write sysctl file position at which sysctl is being accessed (read or written). The field can be used to e.g. override whole sysctl value on write to sysctl even when sys_write is called by user space with file_pos > 0. Or BPF program may reject such accesses. Change-Id: Ia2674d14b24b99c5081522f8b3025c52f6228bfa Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-08 17:20:20 +03:00
Andrey Ignatov	b0c10648da	BACKPORT: bpf: Introduce bpf_sysctl_{get,set}_new_value helpers Add helpers to work with new value being written to sysctl by user space. bpf_sysctl_get_new_value() copies value being written to sysctl into provided buffer. bpf_sysctl_set_new_value() overrides new value being written by user space with a one from provided buffer. Buffer should contain string representation of the value, similar to what can be seen in /proc/sys/. Both helpers can be used only on sysctl write. File position matters and can be managed by an interface that will be introduced separately. E.g. if user space calls sys_write to a file in /proc/sys/ at file position = X, where X > 0, then the value set by bpf_sysctl_set_new_value() will be written starting from X. If program wants to override whole value with specified buffer, file position has to be set to zero. Documentation for the new helpers is provided in bpf.h UAPI. Change-Id: Ibd8931b2ebb9b88563424141a66269ddb0d193b4 Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-08 17:20:20 +03:00
Andrey Ignatov	0c0435189c	BACKPORT: bpf: Sysctl hook Containerized applications may run as root and it may create problems for whole host. Specifically such applications may change a sysctl and affect applications in other containers. Furthermore in existing infrastructure it may not be possible to just completely disable writing to sysctl, instead such a process should be gradual with ability to log what sysctl are being changed by a container, investigate, limit the set of writable sysctl to currently used ones (so that new ones can not be changed) and eventually reduce this set to zero. The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis. New program type has access to following minimal context: struct bpf_sysctl { __u32 write; }; Where @write indicates whether sysctl is being read (= 0) or written (= 1). Helpers to access sysctl name and value will be introduced separately. BPF_CGROUP_SYSCTL attach point is added to sysctl code right before passing control to ctl_table->proc_handler so that BPF program can either allow or deny access to sysctl. Suggested-by: Roman Gushchin <guro@fb.com> Change-Id: I1cefe648cac50683d02ca33cbbbd865bd4522dfe Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-08 17:20:18 +03:00
Lee Jones	9744c81f5f	ANDROID: Incremental fs: Allocate data buffer based on input request size Presently the data buffer used to return the per-UID timeout description is created based on information provided by the user. It is expected that the user populates a variable called 'timeouts_array_size' which is heavily scrutinised to ensure the value provided is appropriate i.e. smaller than the largest possible value but large enough to contain all of the data we wish to pass back. The issue is that the aforementioned scrutiny is imposed on a different variable to the one expected. Contrary to expectation, the data buffer is actually being allocated to the size specified in a variable named 'timeouts_array_size_out'. A variable originally designed to only contain the output information i.e. the size of the data actually copied to the user for consumption. This value is also user provided and is not given the same level of scrutiny as the former. The fix in this case is simple. Ignore 'timeouts_array_size_out' until it is time to populate (over-write) it ourselves and use 'timeouts_array_size' to shape the buffer as intended. Bug: 281547360 Change-Id: I95e12879a33a2355f9e4bc0ce2bfc3f229141aa8 Signed-off-by: Lee Jones <joneslee@google.com> (cherry picked from commit 5a4d20a3eb4e651f88ed2f1f08cee066639ca801)	2025-09-08 17:19:32 +03:00
Paul Lawrence	05587b0a55	ANDROID: incremental fs: Fix race between truncate and write last block Also fix race whereby multiple providers writinig the same block would actually write out the same block. Note that multiple_providers_test started failing when incfs was ported to 5.15, and these fixes are needed to make the test reliable Bug: 264703896 Test: incfs-test passes, specifically multiple_providers_test. Ran 100 times Change-Id: I05ad5b2b2f62cf218256222cecb79bbe9953bd97 Signed-off-by: Paul Lawrence <paullawrence@google.com>	2025-09-08 17:19:32 +03:00
Tadeusz Struk	a16ea56c31	ANDROID: incfs: Add check for ATTR_KILL_SUID and ATTR_MODE in incfs_setattr Add an explicite check for ATTR_KILL_SUID and ATTR_MODE in incfs_setattr. Both of these attributes can not be set at the same time, otherwise notify_change() function will check it and invoke BUG(), crashing the system. Bug: 243394930 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: I91080d68efbd62f1441e20a5c02feef3d1b06e4e	2025-09-08 17:19:31 +03:00
Tadeusz Struk	74af206849	ANDROID: incremental-fs: Add free options in incfs_mount After the revert of: "ANDROID: incremental-fs: remove index and incomplete dir on umount" the free call was missing. Add it back to prevent a leak. Bug: 217661925 Bug: 218732047 Bug: 219731048 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: I9761183720af61bee373accfb57d92ffc68a34b1	2025-09-08 17:19:31 +03:00
Tadeusz Struk	21b52668f6	ANDROID: incremental-fs: fix GPF in pending_reads_dispatch_ioctl It is possible that fget returns NULL. This needs to be handled correctly in ioctl_permit_fill. Bug: 212821226 Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Change-Id: Iec8be21982afeab6794b78ab1a542671c52acea2	2025-09-08 17:19:31 +03:00
Ondrej Mosnacek	367266779e	UPSTREAM: userfaultfd: open userfaultfds with O_RDONLY [ Upstream commit abec3d015fdfb7c63105c7e1c956188bf381aa55 ] Since userfaultfd doesn't implement a write operation, it is more appropriate to open it read-only. When userfaultfds are opened read-write like it is now, and such fd is passed from one process to another, SELinux will check both read and write permissions for the target process, even though it can't actually do any write operation on the fd later. Inspired by the following bug report, which has hit the SELinux scenario described above: https://bugzilla.redhat.com/show_bug.cgi?id=1974559 Reported-by: Robert O'Callahan <roc@ocallahan.org> Fixes: `86039bd3b4` ("userfaultfd: add new syscall to provide memory externalization") Change-Id: I447383108d9e73e3f0ff0456f6dfb1aa67ac7b3c Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-09-08 17:19:31 +03:00
Peter Xu	13640659c1	BACKPORT: userfaultfd: wp: declare _UFFDIO_WRITEPROTECT conditionally Only declare _UFFDIO_WRITEPROTECT if the user specified UFFDIO_REGISTER_MODE_WP and if all the checks passed. Then when the user registers regions with shmem/hugetlbfs we won't expose the new ioctl to them. Even with complete anonymous memory range, we'll only expose the new WP ioctl bit if the register mode has MODE_WP. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@fb.com> Link: http://lkml.kernel.org/r/20200220163112.11409-18-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 14819305e09fe4fda546f0dfa12134c8e5366616) [ Kalesh Singh - resolve conflicts in fs/userfaultfd.c ] Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reported-by: kernel test robot <lkp@intel.com> [1] [1] https://lore.kernel.org/r/202201170247.Cir3moOM-lkp@intel.com/ Bug: 160737021 Bug: 169683130 Change-Id: I4f205e642f5f0e5824a43303aab30626cce3ddcb	2025-09-08 17:19:29 +03:00
Orson Zhai	22ffd846d8	ANDROID: userfaultfd: Fix untag pointer in userfaultfd_continue() Fixes e71e2ace5721 ("userfaultfd: do not untag user pointers"). Above patch is ported from upstream into LTS branch then merged into android12-5.4. The original patch was going to fix all untag user pointers. The change to userfaultfd_continue() was cut off when being merged into LTS because the routine does not exist in LTS. But specially the routine has been cherry-picked into android12-5.4 with commit b69f713e60d0 ("BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl") by Lokesh Gidra <lokeshgidra@google.com> long ago. So add back the missing part of fixing here. Fixes: e71e2ace5721 ("userfaultfd: do not untag user pointers") Change-Id: I32da80c2e9517356daadf566a433e056f6bef08c Signed-off-by: Orson Zhai <orson.zhai@unisoc.com>	2025-09-08 17:19:29 +03:00
Axel Rasmussen	20709cfacf	BACKPORT: FROMGIT: userfaultfd/shmem: advertise shmem minor fault support Now that the feature is fully implemented (the faulting path hooks exist so userspace is notified, and the ioctl to resolve such faults is available), advertise this as a supported feature. Link: https://lkml.kernel.org/r/20210503180737.2487560-6-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Brian Geffon <bgeffon@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Wang Qing <wangqing@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 37aa962fe33a562fd0ac21b68938023e12041fc3 https: //git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1420971/ Conflicts: Documentation/admin-guide/mm/userfaultfd.rst (Manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 187930641 Change-Id: I5fbab3783ff8671c0a5aa4826aead2d63f5cbbf3	2025-09-08 17:19:28 +03:00
Axel Rasmussen	65d3a867f1	BACKPORT: FROMGIT: userfaultfd/shmem: support minor fault registration for shmem This patch allows shmem-backed VMAs to be registered for minor faults. Minor faults are appropriately relayed to userspace in the fault path, for VMAs with the relevant flag. This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed minor faults, though, so userspace doesn't yet have a way to resolve such faults. Because of this, we also don't yet advertise this as a supported feature. That will be done in a separate commit when the feature is fully implemented. Link: https://lkml.kernel.org/r/20210503180737.2487560-4-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Brian Geffon <bgeffon@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Wang Qing <wangqing@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 6867a29320b7d178feb9786856e5ea2cf40f6d33 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1420970/ Conflicts: mm/shmem.c (Manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 187930641 Change-Id: Ib8e3fe202feab7404742814f4250a0981065368c	2025-09-08 17:19:28 +03:00
Lokesh Gidra	6fa6518b18	Revert "BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem" This reverts commit 0309b3f479b967acb644f99d214e2b25297a20b1 as an updated version of the patch-set will be merged later. Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 187930641 Change-Id: I765fe86a2dc0305482a0590c14143dee27840b8a	2025-09-08 17:19:28 +03:00
Axel Rasmussen	a7c5b47b14	BACKPORT: FROMGIT: userfaultfd: support minor fault handling for shmem Patch series "userfaultfd: support minor fault handling for shmem", v2. Overview ======== See my original series [1] for a detailed overview of minor fault handling in general. The feature in this series works exactly like the hugetblfs version (from userspace's perspective). I'm sending this as a separate series because: - The original minor fault handling series has a full set of R-Bs, and seems close to being merged. So, it seems reasonable to start looking at this next step, which extends the basic functionality. - shmem is different enough that this series may require some additional work before it's ready, and I don't want to delay the original series unnecessarily by bundling them together. Use Case ======== In some cases it is useful to have VM memory backed by tmpfs instead of hugetlbfs. So, this feature will be used to support the same VM live migration use case described in my original series. Additionally, Android folks (Lokesh Gidra <lokeshgidra@google.com>) hope to optimize the Android Runtime garbage collector using this feature: "The plan is to use userfaultfd for concurrently compacting the heap. With this feature, the heap can be shared-mapped at another location where the GC-thread(s) could continue the compaction operation without the need to invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads get faults on the heap, UFFDIO_CONTINUE can be used to resume execution. Furthermore, this feature enables updating references in the 'non-moving' portion of the heap efficiently. Without this feature, uneccessary page copying (ioctl(UFFDIO_COPY)) would be required." [1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen@google.com/T/#t This patch (of 5): Modify the userfaultfd register API to allow registering shmem VMAs in minor mode. Modify the shmem mcopy implementation to support UFFDIO_CONTINUE in order to resolve such faults. Combine the shmem mcopy handler functions into a single shmem_mcopy_atomic_pte, which takes a mode parameter. This matches how the hugetlbfs implementation is structured, and lets us remove a good chunk of boilerplate. Link: https://lkml.kernel.org/r/20210302000133.272579-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210302000133.272579-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Wang Qing <wangqing@vivo.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 4cc6e15679966aa49afc5b114c3c83ba0ac39b05 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388146/ Conflicts: include/linux/shmem_fs.h mm/shmem.c mm/userfaultfd.c (1. write-protect related conflicts, rebased manually 2. Enclose shmem_mcopy_atomic_pte() with CONFIG_USERFAULTFD to avoid compile errors when USERFAULTFD is not enabled.) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Idcd822b2a124a089121b9ad8c65061f6979126ec	2025-09-08 17:19:27 +03:00
Axel Rasmussen	dbbe755a45	BACKPORT: FROMGIT: userfaultfd: add UFFDIO_CONTINUE ioctl This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page *pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Link: https://lkml.kernel.org/r/20210301222728.176417-5-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 14ea86439abaf3423cd9b6712ed5ce8451d2d181 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388136/ Conflicts: fs/userfaultfd.c include/linux/hugetlb.h include/linux/userfaultfd_k.h include/uapi/linux/userfaultfd.h mm/hugetlb.c mm/userfaultfd.c (1. 8f251a3d5ce3bdea73bd045ed35db64f32e0d0d9 is not cherry-picked yet so switched SetHPageMigratable() to set_active_huge_page() in mm/hugetlb.c, 2. Other files conflicts due to lack of write-protect userfaultfd support. Manually rebased accordingly 3. Included linux/mm.h in linux/userfaultfd_k.h for definitions of VM_UFFD_) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I45b62959dcb1d343154cb831113a26e47e77c8af	2025-09-08 17:19:26 +03:00
Axel Rasmussen	7252042af5	BACKPORT: FROMGIT: userfaultfd: add minor fault registration mode Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will also get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will see a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 82a150ec394f6b944e26786b907fc0deab5b2064 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1388132/ Conflicts: arch/x86/Kconfig fs/userfaultfd.c include/linux/userfaultfd_k.h include/uapi/linux/userfaultfd.h init/Kconfig mm/hugetlb.c (Lack of userfaultfd write-protect support in 5.4 lead to all conflicts. Resolved by carefully rebasing such that write-protect related code doesn't get added) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I43b37272d531341439ceaa03213d0e2415e04688	2025-09-08 17:19:25 +03:00
Peter Xu	6b91be8bf5	BACKPORT: FROMGIT: hugetlb/userfaultfd: unshare all pmds for hugetlbfs when register wp Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Link: https://lkml.kernel.org/r/20210218231206.15524-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> (cherry picked from commit 267bda5c9993856b86f91a998df632b29cf517e2 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git akpm) Link: https://lore.kernel.org/patchwork/patch/1382208/ Conflicts: mm/hugetlb.c (CONFIG_CMA not CP'ed in this kernel. Manual rebase) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I99d541ce45aaf924fa912f00dafa4caefe307755	2025-09-08 17:19:25 +03:00
Brian Geffon	fcdd07ace3	BACKPORT: FROMLIST: Revert "mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio" This reverts commit cd544fd1dc9293c6702fab6effa63dac1cc67e99. As discussed in [1] this commit was a no-op because the mapping type was checked in vma_to_resize before move_vma is ever called. This meant that vm_ops->mremap() would never be called on such mappings. Furthermore, we've since expanded support of MREMAP_DONTUNMAP to non-anonymous mappings, and these special mappings are still protected by the existing check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL. 1. https://lkml.org/lkml/2020/12/28/2340 Signed-off-by: Brian Geffon <bgeffon@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Link: https://lore.kernel.org/patchwork/patch/1401226/ Conflicts: include/linux/mm.h (Resolved minor conflict with manual rebase) Bug: 160737021 Bug: 169683130 Change-Id: I97d29e6a54cee07ba69d6bb880778ee1fea8ff7c	2025-09-08 17:19:24 +03:00
Peter Xu	26d75812d0	UPSTREAM: mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path Userfaultfd fault path was by default killable even if the caller does not have FAULT_FLAG_KILLABLE. That makes sense before in that when with gup we don't have FAULT_FLAG_KILLABLE properly set before. Now after previous patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE. Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code right now, this patch should have no functional change. It also cleaned the code a little bit by introducing some helpers. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 3e69ad081c18d138fc7fd0f1ceef3b055ab10549) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I514975b9934e3ba2860661b0d463fd46410b7a90	2025-09-08 17:19:22 +03:00
Peter Xu	f32e98fcc8	BACKPORT: mm: introduce FAULT_FLAG_INTERRUPTIBLE handle_userfaultfd() is currently the only one place in the kernel page fault procedures that can respond to non-fatal userspace signals. It was trying to detect such an allowance by checking against USER & KILLABLE flags, which was "un-official". In this patch, we introduced a new flag (FAULT_FLAG_INTERRUPTIBLE) to show that the fault handler allows the fault procedure to respond even to non-fatal signals. Meanwhile, add this new flag to the default fault flags so that all the page fault handlers can benefit from the new flag. With that, replacing the userfault check to this one. Since the line is getting even longer, clean up the fault flags a bit too to ease TTY users. Although we've got a new flag and applied it, we shouldn't have any functional change with this patch so far. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220195348.16302-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c270a7eedcf278304e05ebd2c96807487c97db61) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I4abde9ce54aae56b0791884adf45210973f8a210	2025-09-08 17:19:21 +03:00
Peter Xu	ceb79c2328	UPSTREAM: userfaultfd: don't retake mmap_sem to emulate NOPAGE This patch removes the risk path in handle_userfault() then we will be sure that the callers of handle_mm_fault() will know that the VMAs might have changed. Meanwhile with previous patch we don't lose responsiveness as well since the core mm code now can handle the nonfatal userspace signals even if we return VM_FAULT_RETRY. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Jerome Glisse <jglisse@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160234.9646-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ef429ee7409aa7cbe4c3c9e2df5dc6abedfab493) Bug: 176847924 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I8173824922cd6310341a367e622652a3d87437f1	2025-09-08 17:19:21 +03:00
Dmitry Safonov	256bbc01ca	UPSTREAM: mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio As kernel expect to see only one of such mappings, any further operations on the VMA-copy may be unexpected by the kernel. Maybe it's being on the safe side, but there doesn't seem to be any expected use-case for this, so restrict it now. Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by: Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit cd544fd1dc9293c6702fab6effa63dac1cc67e99) Bug: 176847609 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I9fc569e41bd7fc07c2a168c96d83eda0396c2305	2025-09-08 17:19:19 +03:00
Lukas Bulwahn	bbabadb96a	UPSTREAM: fs: anon_inodes: rephrase to appropriate kernel-doc Commit e7e832ce6fa7 ("fs: add LSM-supporting anon-inode interface") adds more kerneldoc description, but also a few new warnings on anon_inode_getfd_secure() due to missing parameter descriptions. Rephrase to appropriate kernel-doc for anon_inode_getfd_secure(). Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> (cherry picked from commit 365982aba1f264dba26f0908700d62bfa046918c) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Ie7837f21dfe28c03594ebc65fd293c00c57ba5c5	2025-09-08 17:19:16 +03:00
Daniel Colascione	78e90362c8	UPSTREAM: userfaultfd: use secure anon inodes for userfaultfd This change gives userfaultfd file descriptors a real security context, allowing policy to act on them. Signed-off-by: Daniel Colascione <dancol@google.com> [LG: Remove owner inode from userfaultfd_ctx] [LG: Use anon_inode_getfd_secure() in userfaultfd syscall] [LG: Use inode of file in userfaultfd_read() in resolve_userfault_fork()] Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> (cherry picked from commit b537900f1598b67bcb8acac20da73c6e26ebbf99) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Ifd3faca4058bd9e4c51767aa0246e1c53ad410d4	2025-09-08 17:19:16 +03:00
Daniel Colascione	5fa919f235	BACKPORT: fs: add LSM-supporting anon-inode interface This change adds a new function, anon_inode_getfd_secure, that creates anonymous-node file with individual non-S_PRIVATE inode to which security modules can apply policy. Existing callers continue using the original singleton-inode kind of anonymous-inode file. We can transition anonymous inode users to the new kind of anonymous inode in individual patches for the sake of bisection and review. The new function accepts an optional context_inode parameter that callers can use to provide additional contextual information to security modules. For example, in case of userfaultfd, the created inode is a 'logical child' of the context_inode (userfaultfd inode of the parent process) in the sense that it provides the security context required during creation of the child process' userfaultfd inode. Signed-off-by: Daniel Colascione <dancol@google.com> [LG: Delete obsolete comments to alloc_anon_inode()] [LG: Add context_inode description in comments to anon_inode_getfd_secure()] [LG: Remove definition of anon_inode_getfile_secure() as there are no callers] [LG: Make __anon_inode_getfile() static] [LG: Use correct error cast in __anon_inode_getfile()] [LG: Fix error handling in __anon_inode_getfile()] Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> (cherry picked from commit e7e832ce6fa769f800cd7eaebdb0459ad31e0416) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: I88f1821243c58454ce445fd50fd804221e0bfc67	2025-09-08 17:19:16 +03:00
Lokesh Gidra	31da11da9a	UPSTREAM: userfaultfd: add user-mode only option to unprivileged_userfaultfd sysctl knob With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Daniel Colascione <dancol@dancol.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: <calin@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Shaohua Li <shli@fb.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Colascione <dancol@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d0d4730ac2e404a5b0da9a87ef38c73e51cb1664) Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Bug: 160737021 Bug: 169683130 Change-Id: Ic46c0be47d6394d25bd3443ff524936fa568ab85	2025-09-08 17:19:15 +03:00
Lokesh Gidra	b5c951bf01	BACKPORT: userfaultfd: add UFFD_USER_MODE_ONLY Patch series "Control over userfaultfd kernel-fault handling", v6. This patch series is split from [1]. The other series enables SELinux support for userfaultfd file descriptors so that its creation and movement can be controlled. It has been demonstrated on various occasions that suspending kernel code execution for an arbitrary amount of time at any access to userspace memory (copy_from_user()/copy_to_user()/...) can be exploited to change the intended behavior of the kernel. For instance, handling page faults in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise, FUSE, which is similar to userfaultfd in this respect, has been exploited in [4, 5] for similar outcome. This small patch series adds a new flag to userfaultfd(2) that allows callers to give up the ability to handle kernel-mode faults with the resulting UFFD file object. It then adds a 'user-mode only' option to the unprivileged_userfaultfd sysctl knob to require unprivileged callers to use this new flag. The purpose of this new interface is to decrease the chance of an unprivileged userfaultfd user taking advantage of userfaultfd to enhance security vulnerabilities by lengthening the race window in kernel code. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://duasynt.com/blog/linux-kernel-heap-spray [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808 This patch (of 2): userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <calin@google.com> Cc: Daniel Colascione <dancol@dancol.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nitin Gupta <nigupta@nvidia.com> Cc: Peter Xu <peterx@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shaohua Li <shli@fb.com> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 37cd0575b8510159992d279c530c05f872990b02) Conflicts: include/uapi/linux/userfaultfd.h (1. Removed uffdio_writeprotect struct part of 63b2d4174c4ad) Bug: 160737021 Bug: 169683130 Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Change-Id: Ic4cecce08956402f91f866ae08f1d4ec21d64289	2025-09-08 17:19:15 +03:00
Peter Xu	174081edd4	UPSTREAM: userfaultfd/sysctl: add vm.unprivileged_userfaultfd Userfaultfd can be misued to make it easier to exploit existing use-after-free (and similar) bugs that might otherwise only make a short window or race condition available. By using userfaultfd to stall a kernel thread, a malicious program can keep some state that it wrote, stable for an extended period, which it can then access using an existing exploit. While it doesn't cause the exploit itself, and while it's not the only thing that can stall a kernel thread when accessing a memory location, it's one of the few that never needs privilege. We can add a flag, allowing userfaultfd to be restricted, so that in general it won't be useable by arbitrary user programs, but in environments that require userfaultfd it can be turned back on. Add a global sysctl knob "vm.unprivileged_userfaultfd" to control whether userfaultfd is allowed by unprivileged users. When this is set to zero, only privileged users (root user, or users with the CAP_SYS_PTRACE capability) will be able to use the userfaultfd syscalls. Andrea said: : The only difference between the bpf sysctl and the userfaultfd sysctl : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement, : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE : already if it's doing other kind of tracking on processes runtime, in : addition of userfaultfd. In other words both syscalls works only for : root, when the two sysctl are opt-in set to 1. [dgilbert@redhat.com: changelog additions] [akpm@linux-foundation.org: documentation tweak, per Mike] Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com Change-Id: Iee5ab7cf04ec1b92b01d625f7e2d8a6308ee2fa0 Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Cracauer <cracauer@cons.org> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-09-08 17:19:14 +03:00
Eric Biggers	111896b747	UPSTREAM: userfaultfd: convert userfaultfd_ctx::refcount to refcount_t Reference counters should use refcount_t rather than atomic_t, since the refcount_t implementation can prevent overflows, reducing the exploitability of reference leak bugs. userfaultfd_ctx::refcount is a reference counter with the usual semantics, so convert it to refcount_t. Note: I replaced the BUG() on incrementing a 0 refcount with just refcount_inc(), since part of the semantics of refcount_t is that that incrementing a 0 refcount is not allowed; with CONFIG_REFCOUNT_FULL, refcount_inc() already checks for it and warns. Link: http://lkml.kernel.org/r/20181115003916.63381-1-ebiggers@kernel.org Change-Id: I3c73bc521c7c22be1ec78f363d4990171cfbbbec Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-09-08 17:19:13 +03:00
Lance Roy	09c01efb6f	UPSTREAM: userfaultfd: Replace spin_is_locked() with lockdep lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Change-Id: I3580f68064c78aa6c63d3114a92743e72ab2e4a1 Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2025-09-08 17:19:12 +03:00
Jaegeuk Kim	f39bd55550	UPSTREAM: f2fs: set highest IO priority for checkpoint thread The checkpoint is the top priority thread which can stop all the filesystem operations. Let's make it RT priority. Reviewed-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit 8a2d9f00d502e6ef68c6d52f0863856040ddd2db) Change-Id: I7ab430a7294fa9644512f0a32245646944446213 Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>	2025-09-08 16:57:12 +03:00
Chao Yu	71b5d2099a	UPSTREAM: f2fs: fix to update age extent in f2fs_do_zero_range() We should update age extent in f2fs_do_zero_range() like we did in f2fs_truncate_data_blocks_range(). Fixes: `dfcfcc6d16` ("f2fs: add block_age-based extent cache") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit a84153f939808102dfa10904aa0f743e734a3e1d) Change-Id: Iab59f94f792dbab8da7ab597d5a219476370a646 Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>	2025-09-08 16:57:12 +03:00
Chao Yu	3e2248e2e2	UPSTREAM: f2fs: fix to update age extent correctly during truncation nr_free may be less than len, we should update age extent cache w/ range [fofs, len] rather than [fofs, nr_free]. Fixes: `dfcfcc6d16` ("f2fs: add block_age-based extent cache") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit 8c0ed062ce27f6b7f0a568cb241e2b4dd2d9e6a6) Change-Id: I0a4546f84fd75db2f13857e4922520ab5885e845 Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>	2025-09-08 16:57:12 +03:00
qixiaoyu1	8eeb0c0be2	UPSTREAM: f2fs: add sysfs nodes to set last_age_weight Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit d23be468eada21c828058e0e8d60409eaec373ab) Change-Id: I4c0a5137cbb6c70dc8ebfe6e1dc7377ae20898fa Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>	2025-09-08 16:57:11 +03:00
qixiaoyu1	cfc402e944	UPSTREAM: f2fs: fix wrong calculation of block age Currently we wrongly calculate the new block age to old * LAST_AGE_WEIGHT / 100. Fix it to new * (100 - LAST_AGE_WEIGHT) / 100 + old * LAST_AGE_WEIGHT / 100. Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit b03a41a495df35f8e8d25220878bd6b8472d9396) Change-Id: I989877caf2517245750e7ed809b2af71f8943f96 Signed-off-by: zhaoyuenan <amktiao030215@gmail.com>	2025-09-08 16:57:11 +03:00

1 2 3 4 5 ...

61001 Commits