55762 Commits

Author SHA1 Message Date
Kanishk
d3ad462391 f2fs: Lower reclaim segment and threshold
• Xiaomi's default segment reclamation (4096), along with that of other OEMs, is designed for MIUI, HyperOS's, and other $hit stocks roms.
  heavy I/O patterns (background services, frequent writes).
  For AOSP: 1. Lower Overhead: Smaller Garbage Collection (GC) batches
               reduce CPU/storage load.
            2. Better Responsiveness: Avoid long GC pauses during light
               usage.
            3. Battery Savings: Less aggressive reclaiming reduces
               power spikes.

Change-Id: 710531a8b778ce9bc190b87ffe22fa3a52e51499
Signed-off-by: Kanishk <kanishkthederp@gmail.com>
Signed-off-by: TogoFire <togofire@mailfence.com>
2026-01-04 11:55:52 +05:30
sidex15
1219b14df9 fs: implement susfs v1.5.12
- This is a heavily modified version of susfs v1.5.12
- It does not comply with the upstream offical susfs v1.5.12
- sus_mount functionality still remain in v1.5.5 as backporting it to the latest version will result a mount detection leak in some apps/detectors
- Increase susfs_open_redirect UID limit to <11000
- susfs magic mount support is still implemented and enabled
- sus_map is implemented and complied with the upstream v1.5.12 codebase

This commit requires a bunch of backports commits from v4.19 and v5.x to make sus_map working:

0a8cbf3725edbacc5f1ead33eeae7e4d78823b5a proc: less memory for /proc/*/map_files readdir
37ae2444584654f6785f2cc49181f05af788c9b2 mm: smaps: split PSS into components
49a5115e11350ee68f6a5fbd56b3e817bf9e5aac fs/task_mmu: add pkeys header
6f94042bed51121f8f28a5e572cda20c21fed2e1 mm/pkeys: Add an empty arch_pkeys_enabled()
bbd5aec12b32097a71dc6a0097194a18f3ee9a17 mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
849ca8ce954d9dbb082dcf83c98af861e98e5635 mm: /proc/pid/smaps_rollup: convert to single value seq_file
6071a482c8e603be25895cc2cac5f0eab61c4051 mm: /proc/pid/smaps: factor out common stats printing
03fd2fbe9c40da8128cec5c69ef54755c0f38c6c mm: /proc/pid/smaps: factor out mem stats gathering
95f8be4c8a86a491a1c2ac9bfe470aef9e1baa8f mm: /proc/pid/*maps remove is_pid and related wrappers
27956d255e3b012372951dd6131e07c106d2daae procfs: add seq_put_hex_ll to speed up /proc/pid/maps
7f2847d02cdc4491b5ee6d4a0043854cbd6c7a1a proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps

For KernelSU side patches for this commit you need the sidex15's KernelSU-Next fork:
https://github.com/sidex15/KernelSU-Next/tree/n3x7g3n-kernel

Or if you want to patch on your own here's the commit patch of susfs in the KernelSU-Next:
13b1dfd6e2

Co-authored-by: simonpunk <simonpunk2016@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:52 +05:30
Alexey Dobriyan
22d95a657d proc: less memory for /proc/*/map_files readdir
dentry name can be evaluated later, right before calling into VFS.

Also, spend less time under ->mmap_sem.

Link: http://lkml.kernel.org/r/20171110163034.GA2534@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:52 +05:30
Luigi Semenzato
1a83991b7c mm: smaps: split PSS into components
Report separate components (anon, file, and shmem) for PSS in
smaps_rollup.

This helps understand and tune the memory manager behavior in consumer
devices, particularly mobile devices.  Many of them (e.g.  chromebooks and
Android-based devices) use zram for anon memory, and perform disk reads
for discarded file pages.  The difference in latency is large (e.g.
reading a single page from SSD is 30 times slower than decompressing a
zram page on one popular device), thus it is useful to know how much of
the PSS is anon vs.  file.

All the information is already present in /proc/pid/smaps, but much more
expensive to obtain because of the large size of that procfs entry.

This patch also removes a small code duplication in smaps_account, which
would have gotten worse otherwise.

Also updated Documentation/filesystems/proc.txt (the smaps section was a
bit stale, and I added a smaps_rollup section) and
Documentation/ABI/testing/procfs-smaps_rollup.

[semenzato@chromium.org: v5]
  Link: http://lkml.kernel.org/r/20190626234333.44608-1-semenzato@chromium.org
Link: http://lkml.kernel.org/r/20190626180429.174569-1-semenzato@chromium.org
Signed-off-by: Luigi Semenzato <semenzato@chromium.org>
Acked-by: Yu Zhao <yuzhao@chromium.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Yu Zhao <yuzhao@chromium.org>
Cc: Brian Geffon <bgeffon@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:52 +05:30
sidex15
3a4367259c fs/task_mmu: add pkeys header
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:52 +05:30
Vlastimil Babka
1af9890e34 mm: /proc/pid/smaps_rollup: convert to single value seq_file
The /proc/pid/smaps_rollup file is currently implemented via the
m_start/m_next/m_stop seq_file iterators shared with the other maps files,
that iterate over vma's.  However, the rollup file doesn't print anything
for each vma, only accumulate the stats.

There are some issues with the current code as reported in [1] - the
accumulated stats can get skewed if seq_file start()/stop() op is called
multiple times, if show() is called multiple times, and after seeks to
non-zero position.

Patch [1] fixed those within existing design, but I believe it is
fundamentally wrong to expose the vma iterators to the seq_file mechanism
when smaps_rollup shows logically a single set of values for the whole
address space.

This patch thus refactors the code to provide a single "value" at offset
0, with vma iteration to gather the stats done internally.  This fixes the
situations where results are skewed, and simplifies the code, especially
in show_smap(), at the expense of somewhat less code reuse.

[1] https://marc.info/?l=linux-mm&m=151927723128134&w=2

[vbabka@suse.c: use seq_file infrastructure]
  Link: http://lkml.kernel.org/r/bf4525b0-fd5b-4c4c-2cb3-adee3dd95a48@suse.cz
Link: http://lkml.kernel.org/r/20180723111933.15443-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Daniel Colascione <dancol@google.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:52 +05:30
Vlastimil Babka
93b0969b71 mm: /proc/pid/smaps: factor out common stats printing
To prepare for handling /proc/pid/smaps_rollup differently from
/proc/pid/smaps factor out from show_smap() printing the parts of output
that are common for both variants, which is the bulk of the gathered
memory stats.

[vbabka@suse.cz: add const, per Alexey]
  Link: http://lkml.kernel.org/r/b45f319f-cd04-337b-37f8-77f99786aa8a@suse.cz
Link: http://lkml.kernel.org/r/20180723111933.15443-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Vlastimil Babka
7dbf2de9b8 mm: /proc/pid/smaps: factor out mem stats gathering
To prepare for handling /proc/pid/smaps_rollup differently from
/proc/pid/smaps factor out vma mem stats gathering from show_smap() - it
will be used by both.

Link: http://lkml.kernel.org/r/20180723111933.15443-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Vlastimil Babka
1b91b3aa80 mm: /proc/pid/*maps remove is_pid and related wrappers
Patch series "cleanups and refactor of /proc/pid/smaps*".

The recent regression in /proc/pid/smaps made me look more into the code.
Especially the issues with smaps_rollup reported in [1] as explained in
Patch 4, which fixes them by refactoring the code.  Patches 2 and 3 are
preparations for that.  Patch 1 is me realizing that there's a lot of
boilerplate left from times where we tried (unsuccessfuly) to mark thread
stacks in the output.

Originally I had also plans to rework the translation from
/proc/pid/*maps* file offsets to the internal structures.  Now the offset
means "vma number", which is not really stable (vma's can come and go
between read() calls) and there's an extra caching of last vma's address.
My idea was that offsets would be interpreted directly as addresses, which
would also allow meaningful seeks (see the ugly seek_to_smaps_entry() in
tools/testing/selftests/vm/mlock2.h).  However loff_t is (signed) long
long so that might be insufficient somewhere for the unsigned long
addresses.

So the result is fixed issues with skewed /proc/pid/smaps_rollup results,
simpler smaps code, and a lot of unused code removed.

[1] https://marc.info/?l=linux-mm&m=151927723128134&w=2

This patch (of 4):

Commit b76437579d ("procfs: mark thread stack correctly in
proc/<pid>/maps") introduced differences between /proc/PID/maps and
/proc/PID/task/TID/maps to mark thread stacks properly, and this was
also done for smaps and numa_maps.  However it didn't work properly and
was ultimately removed by commit b18cb64ead ("fs/proc: Stop trying to
report thread stacks").

Now the is_pid parameter for the related show_*() functions is unused
and we can remove it together with wrapper functions and ops structures
that differ for PID and TID cases only in this parameter.

Link: http://lkml.kernel.org/r/20180723111933.15443-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Andrei Vagin
0d91b5fa04 procfs: add seq_put_hex_ll to speed up /proc/pid/maps
seq_put_hex_ll() prints a number in hexadecimal notation and works
faster than seq_printf().

== test.py
  num = 0
  with open("/proc/1/maps") as f:
          while num < 10000 :
                  data = f.read()
                  f.seek(0, 0)
                 num = num + 1
==

== Before patch ==
  $  time python test.py

  real	0m1.561s
  user	0m0.257s
  sys	0m1.302s

== After patch ==
  $ time python test.py

  real	0m0.986s
  user	0m0.279s
  sys	0m0.707s

$ perf -g record python test.py:

== Before patch ==
-   67.42%     2.82%  python   [kernel.kallsyms] [k] show_map_vma.isra.22
   - 64.60% show_map_vma.isra.22
      - 44.98% seq_printf
         - seq_vprintf
            - vsnprintf
               + 14.85% number
               + 12.22% format_decode
                 5.56% memcpy_erms
      + 15.06% seq_path
      + 4.42% seq_pad
   + 2.45% __GI___libc_read

== After patch ==
-   47.35%     3.38%  python   [kernel.kallsyms] [k] show_map_vma.isra.23
   - 43.97% show_map_vma.isra.23
      + 20.84% seq_path
      - 15.73% show_vma_header_prefix
           10.55% seq_put_hex_ll
         + 2.65% seq_put_decimal_ull
           0.95% seq_putc
      + 6.96% seq_pad
   + 2.94% __GI___libc_read

[avagin@openvz.org: use unsigned int instead of int where it is suitable]
  Link: http://lkml.kernel.org/r/20180214025619.4005-1-avagin@openvz.org
[avagin@openvz.org: v2]
  Link: http://lkml.kernel.org/r/20180117082050.25406-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/20180112185812.7710-1-avagin@openvz.org
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Andrei Vagin
73567266c9 proc: add seq_put_decimal_ull_width to speed up /proc/pid/smaps
seq_put_decimal_ull_w(m, str, val, width) prints a decimal number with a
specified minimal field width.

It is equivalent of seq_printf(m, "%s%*d", str, width, val), but it
works much faster.

== test_smaps.py
  num = 0
  with open("/proc/1/smaps") as f:
          for x in xrange(10000):
                  data = f.read()
                  f.seek(0, 0)
==

== Before patch ==
  $ time python test_smaps.py
  real    0m4.593s
  user    0m0.398s
  sys     0m4.158s

== After patch ==
  $ time python test_smaps.py
  real    0m3.828s
  user    0m0.413s
  sys     0m3.408s

$ perf -g record python test_smaps.py
== Before patch ==
-   79.01%     3.36%  python   [kernel.kallsyms]    [k] show_smap.isra.33
   - 75.65% show_smap.isra.33
      + 48.85% seq_printf
      + 15.75% __walk_page_range
      + 9.70% show_map_vma.isra.23
        0.61% seq_puts

== After patch ==
-   75.51%     4.62%  python   [kernel.kallsyms]    [k] show_smap.isra.33
   - 70.88% show_smap.isra.33
      + 24.82% seq_put_decimal_ull_w
      + 19.78% __walk_page_range
      + 12.74% seq_printf
      + 11.08% show_map_vma.isra.23
      + 1.68% seq_puts

[akpm@linux-foundation.org: fix drivers/of/unittest.c build]
Link: http://lkml.kernel.org/r/20180212074931.7227-1-avagin@openvz.org
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Pranav Vashi
6d59646090 fs: kernelsu: Add scope-minimized manual hooks 1.4
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
chiteroman
04a1e6280b BACKPORT: fs: path_umount for KernelSu
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:51 +05:30
Khazhismel Kumykov
b5b56112d5 fs: ext4: cond_resched in work-heavy group loops
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:50 +05:30
Minchan Kim
2df938fa41 mm: filter out compound page on per-process reclaim
We don't handle any compound page attached to page table
so filter it out.

[ 2639.685100] c2   8869 trying to isolate tail page
[ 2639.685102] c2   8869 ------------[ cut here ]------------
[ 2639.685108] c2   8869 WARNING: CPU: 2 PID: 8869 at /usr/local/google/home/minchan/nvme/bc-kernel/private/msm-google/mm/vmscan.c:1636 isolate_lru_page+0x238/0x248
[ 2639.685123] c2   8869 Modules linked in: sec_touch heatmap videobuf2_vmalloc videobuf2_memops snd_soc_sdm845 snd_soc_cs35l36 snd_soc_wcd_spi snd_soc_wcd934x snd_soc_wcd9xxx wcd_dsp_glink wcd_core pinctrl_wcd wlan(O)
[ 2639.685127] c2   8869 CPU: 2 PID: 8869 Comm: sh Tainted: G        W  O    4.9.124-665644-g6bb3e72d4673_audio-g3dce958 #19
[ 2639.685129] c2   8869 Hardware name: Google Inc. MSM sdm845 C1 DVT1.1 (DT)
[ 2639.685132] c2   8869 task: ffffffdd6a99e900 task.stack: ffffffdde8c74000
[ 2639.685135] c2   8869 PC is at isolate_lru_page+0x238/0x248
[ 2639.685138] c2   8869 LR is at isolate_lru_page+0x238/0x248
[ 2639.685141] c2   8869 pc : [<ffffff88631da1c0>] lr : [<ffffff88631da1c0>] pstate: 60400145
[ 2639.685142] c2   8869 sp : ffffffdde8c77b20
[ 2639.685145] c2   8869 x29: ffffffdde8c77b20 x28: ffffffddad771ed0
[ 2639.685148] c2   8869 x27: ffffffddde7882f8 x26: ffffffbf77c4fe80
[ 2639.685152] c2   8869 x25: ffffffdde8c77bd8 x24: ffffffdde9cd2840
[ 2639.685155] c2   8869 x23: 0000000000000000 x22: ffffffbf76b5dc70
[ 2639.685158] c2   8869 x21: ffffffdde8c77d08 x20: 00000000cc000000
[ 2639.685161] c2   8869 x19: ffffffbf77c4fe80 x18: 0000007a14e32000
[ 2639.685164] c2   8869 x17: 4000000000000000 x16: 0000000000000000
[ 2639.685167] c2   8869 x15: 0000000000000178 x14: 2be2cf1bbc25f200
[ 2639.685170] c2   8869 x13: ffffffdde8c77b20 x12: ffffffdde8c77b20
[ 2639.685173] c2   8869 x11: ffffffdde8c77b20 x10: ffffffdde8c77b20
[ 2639.685176] c2   8869 x9 : ffffffdde8c77ae0 x8 : 70206c6961742065
[ 2639.685179] c2   8869 x7 : 74616c6f7369206f x6 : ffffff886553cc88
[ 2639.685182] c2   8869 x5 : 0000000000000015 x4 : 0000000000000000
[ 2639.685185] c2   8869 x3 : 0000000000000140 x2 : 2be2cf1bbc25f200
[ 2639.685188] c2   8869 x1 : 2be2cf1bbc25f200 x0 : 000000000000001b

Bug: 119789589
Change-Id: Ib104c94722255c006e9fc78af24f913afdf59591
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:50 +05:30
Chao Yu
2bec656215 f2fs: reduce expensive checkpoint trigger frequency
[ Upstream commit aaf8c0b9ae042494cb4585883b15c1332de77840 ]

We may trigger high frequent checkpoint for below case:
1. mkdir /mnt/dir1; set dir1 encrypted
2. touch /mnt/file1; fsync /mnt/file1
3. mkdir /mnt/dir2; set dir2 encrypted
4. touch /mnt/file2; fsync /mnt/file2
...

Although, newly created dir and file are not related, due to
commit bbf156f7af ("f2fs: fix lost xattrs of directories"), we will
trigger checkpoint whenever fsync() comes after a new encrypted dir
created.

In order to avoid such performance regression issue, let's record an
entry including directory's ino in global cache whenever we update
directory's xattr data, and then triggerring checkpoint() only if
xattr metadata of target file's parent was updated.

This patch updates to cover below no encryption case as well:
1) parent is checkpointed
2) set_xattr(dir) w/ new xnid
3) create(file)
4) fsync(file)

Change-Id: Id7c4c5b70c239458b74f92edca537dd844b0be6f
Fixes: bbf156f7af ("f2fs: fix lost xattrs of directories")
Reported-by: wangzijie <wangzijie1@honor.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Tested-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reported-by: Yunlei He <heyunlei@hihonor.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:50 +05:30
Chao Yu
a56cc534c7 f2fs: remove unneeded check condition in __f2fs_setxattr()
[ Upstream commit bc3994ffa4cf23f55171943c713366132c3ff45d ]

It has checked return value of write_all_xattrs(), remove unneeded
following check condition.

Change-Id: Ib125bc228b2d3094c89b1ff1233188487892cd89
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: aaf8c0b9ae04 ("f2fs: reduce expensive checkpoint trigger frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:49 +05:30
Chao Yu
013f8e0f31 f2fs: fix to update i_ctime in __f2fs_setxattr()
[ Upstream commit 8874ad7dae8d91d24cc87c545c0073b3b2da5688 ]

generic/728       - output mismatch (see /media/fstests/results//generic/728.out.bad)
    --- tests/generic/728.out	2023-07-19 07:10:48.362711407 +0000
    +++ /media/fstests/results//generic/728.out.bad	2023-07-19 08:39:57.000000000 +0000
     QA output created by 728
    +Expected ctime to change after setxattr.
    +Expected ctime to change after removexattr.
     Silence is golden
    ...
    (Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad'  to see the entire diff)
generic/729        1s

It needs to update i_ctime after {set,remove}xattr, fix it.

Change-Id: Ia828aa1dafd9ab7023d0f3afa3c0664498df569a
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: aaf8c0b9ae04 ("f2fs: reduce expensive checkpoint trigger frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:49 +05:30
Yonggil Song
8bcd049f65 f2fs: fix typo
[ Upstream commit d382e36970ecf8242921400db2afde15fb6ed49e ]

Fix typo in f2fs.h
Detected by Jaeyoon Choi

Change-Id: I03485c82920b63113f8824e32ada59f7d65dc5d9
Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: aaf8c0b9ae04 ("f2fs: reduce expensive checkpoint trigger frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:49 +05:30
Chao Yu
3eac771517 f2fs: enhance to update i_mode and acl atomically in f2fs_setattr()
[ Upstream commit 17232e830afb800acdcc22ae8980bf9d330393ef ]

Previously, in f2fs_setattr(), we don't update S_ISUID|S_ISGID|S_ISVTX
bits with S_IRWXUGO bits and acl entries atomically, so in error path,
chmod() may partially success, this patch enhances to make chmod() flow
being atomical.

Change-Id: I98a2585fe960b56b8d1a930eb7cf23e0a47ed3a5
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Stable-dep-of: aaf8c0b9ae04 ("f2fs: reduce expensive checkpoint trigger frequency")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:49 +05:30
Arjan van de Ven
76bd2c2163 fs: ext4: fsync: optimize double-fsync() a bunch
There are cases where EXT4 is a bit too conservative sending barriers down to
the disk; there are cases where the transaction in progress is not the one
that sent the barrier (in other words: the fsync is for a file for which the
IO happened more time ago and all data was already sent to the disk).

For that case, a more performing tradeoff can be made on SSD devices (which
have the ability to flush their dram caches in a hurry on a power fail event)
where the barrier gets sent to the disk, but we don't need to wait for the
barrier to complete. Any consecutive IO will block on the barrier correctly.

Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:45 +05:30
Christoph Hellwig
5ec87c529c block: add a poll_fn callback to struct request_queue
That we we can also poll non blk-mq queues.  Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:45 +05:30
Park Ju Hyung
88087ccee6 kernfs: Use kmem_cache pool for struct kernfs_open_node/file
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:43 +05:30
Park Ju Hyung
1bdea19fde sdcardfs: Use kmem_cache pool for struct sdcardfs_file_info
These get allocated and freed millions of times on this kernel tree.

Use a dedicated kmem_cache pool and avoid costly dynamic memory allocations.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:43 +05:30
UtsavBalar1231
8cbe836c3b fs/proc/base: Allow AppCompactor to access reclaim
On AOSP ROMs CachedAppOptimizer tries to access reclaim for allocation
stats, process id and name
But CAF restricts userspace access to reclaim resulting in following
warning

[   29.986900] ------------[ cut here ]------------
[   29.986921] mWARNING: CPU: 1 PID: 1858 at
../../../../../../kernel/xiaomi/raphael/mm/vmscan.c:1685
isolate_lru_page+0x1e4/0x1ec
[   29.986925] Modules linked in:
[   29.986932] mCPU: 1 PID: 1858 Comm: CachedAppOptimi Tainted: G S
4.14.210-IMMENSiTY//f950c60da2 #2
[   29.986934] mHardware name: Qualcomm Technologies, Inc. SM8150 V2
PM8150 RAPHAEL (DT)
[   29.986937] mtask: 0000000050b69203 task.stack: 000000007e9ca8b3
[   29.986939] mpc : isolate_lru_page+0x1e4/0x1ec
[   29.986942] mlr : isolate_lru_page+0x1e4/0x1ec
[   29.986943] msp : ffffff8024893a90 pstate : 60000145
[   29.986945] mx29: ffffff8024893a90 x28: 0400000000000001
[   29.986948] mx27: ffffff8024893c18 x26: 0000000000000000
[   29.986952] mx25: 0000000000000000 x24: ffffffbf98cbbc40
[   29.986955] mx23: ffffffe61b3f7b08 x22: 0000000000000000
[   29.986958] mx21: ffffffe61b8010c0 x20: 00000007ffd61000
[   29.986962] mx19: ffffffbf98cbbc40 x18: 0000007922434000
[   29.986965] mx17: 0000000000000000 x16: 0000000000000001
[   29.986968] mx15: ffffffffffffffff x14: 0000007c49f7bd22
[   29.986972] mx13: 0000000000000004 x12: 0000000000000000
[   29.986975] mx11: 0000000000000000 x10: ffffffffffffffff
[   29.986978] mx9 : 7bee4e9ab8c00e00 x8 : 7bee4e9ab8c00e00
[   29.986981] mx7 : 000000000000001b x6 : ffffff9586fd3b93
[   29.986985] mx5 : ffffff80248937d8 x4 : 0000000000000000
[   29.986988] mx3 : 0000000000000065 x2 : 000000000000001b
[   29.986991] mx1 : 00000000000001c0 x0 : 000000000000001b
[   29.986995] m\x0aPC: 0xffffff95849f8650:
[   29.986996] 8650  2a1603e2 94012ce6 2a1f03f3 f94002e8 5282c009
8b090100 944e0032 2a1303e0
[   29.987007] 8670  a9434ff4 a94257f6 f9400bf7 a8c47bfd d65f03c0
9000c620 9106b400 97fcc349
[   29.987017] 8690  d4210000 17ffff94 d102c3ff b0010f28 f9478508
a9067bfd 910183fd f9003bf9
[   29.987026] 86b0  a9085ff8 a90957f6 a90a4ff4 f81f83a8 52801808
52800049 5280018a 72a02808
[   29.987036] m\x0aLR: 0xffffff95849f8650:
[   29.987037] 8650  2a1603e2 94012ce6 2a1f03f3 f94002e8 5282c009
8b090100 944e0032 2a1303e0
[   29.987047] 8670  a9434ff4 a94257f6 f9400bf7 a8c47bfd d65f03c0
9000c620 9106b400 97fcc349
[   29.987056] 8690  d4210000 17ffff94 d102c3ff b0010f28 f9478508
a9067bfd 910183fd f9003bf9
[   29.987066] 86b0  a9085ff8 a90957f6 a90a4ff4 f81f83a8 52801808
52800049 5280018a 72a02808
[   29.987076] m\x0aSP: 0xffffff8024893a50:
[   29.987077] 3a50  849f8690 ffffff95 60000145 00000000 ffffffc8
ffffff80 b8c00e00 7bee4e9a
[   29.987086] 3a70  ffffffff 0000007f 849f8690 ffffff95 24893a90
ffffff80 849f8690 ffffff95
[   29.987096] 3a90  24893b00 ffffff80 84adfba8 ffffff95 1b3f7b08
ffffffe6 1b3f7b00 ffffffe6
[   29.987105] 3ab0  00000000 00000000 1b8010c0 ffffffe6 ffd61000
00000007 ffda0000 00000007
[   29.987115]
[   29.987116] Call trace:
[   29.987119] isolate_lru_page+0x1e4/0x1ec
[   29.987125] reclaim_pte_range+0x144/0x230
[   29.987129] __walk_page_range+0x120/0x224
[   29.987132] walk_page_range+0x4c/0x128
[   29.987135] reclaim_write+0x298/0x3a8
[   29.987140] __vfs_write+0x44/0x134
[   29.987142] vfs_write+0xe0/0x19c
[   29.987144] SyS_write+0x6c/0xcc
[   29.987148] el0_svc_naked+0x34/0x38
[   29.987150] ---[ end trace 17986baba9f80714 ]---
[   29.987155] trying to isolate tail page
[   29.987162] ------------[ cut here ]------------
[   29.987169] mWARNING: CPU: 1 PID: 1858 at
../../../../../../kernel/xiaomi/raphael/mm/vmscan.c:1685
isolate_lru_page+0x1e4/0x1ec
[   29.987172] Modules linked in:
[   29.987176] mCPU: 1 PID: 1858 Comm: CachedAppOptimi Tainted: G S
W       4.14.210-IMMENSiTY//f950c60da2 #2
[   29.987178] mHardware name: Qualcomm Technologies, Inc. SM8150 V2
PM8150 RAPHAEL (DT)
[   29.987179] mtask: 0000000050b69203 task.stack: 000000007e9ca8b3
[   29.987182] mpc : isolate_lru_page+0x1e4/0x1ec
[   29.987184] mlr : isolate_lru_page+0x1e4/0x1ec
[   29.987186] msp : ffffff8024893a90 pstate : 60000145
[   29.987187] mx29: ffffff8024893a90 x28: 0400000000000001
[   29.987191] mx27: ffffff8024893c18 x26: 0000000000000000
[   29.987194] mx25: 0000000000000000 x24: ffffffbf98cbbc80
[   29.987197] mx23: ffffffe61b3f7b10 x22: 0000000000000000
[   29.987200] mx21: ffffffe61b8010c0 x20: 00000007ffd62000
[   29.987204] mx19: ffffffbf98cbbc80 x18: 0000007922434000
[   29.987207] mx17: 0000000000000000 x16: 0000000000000001
[   29.987210] mx15: ffffffffffffffff x14: 0000007c49f7bd22
[   29.987213] mx13: 0000000000000004 x12: 0000000000000000
[   29.987216] mx11: 0000000000000000 x10: ffffffffffffffff
[   29.987220] mx9 : 7bee4e9ab8c00e00 x8 : 7bee4e9ab8c00e00
[   29.987223] mx7 : 000000000000001b x6 : ffffff9586fd3b93
[   29.987226] mx5 : ffffff80248937d8 x4 : 0000000000000000
[   29.987229] mx3 : 0000000000000065 x2 : 000000000000001b
[   29.987232] mx1 : 00000000000001c0 x0 : 000000000000001b
[   29.987236] m\x0aPC: 0xffffff95849f8650:
[   29.987237] 8650  2a1603e2 94012ce6 2a1f03f3 f94002e8 5282c009
8b090100 944e0032 2a1303e0
[   29.987246] 8670  a9434ff4 a94257f6 f9400bf7 a8c47bfd d65f03c0
9000c620 9106b400 97fcc349
[   29.987256] 8690  d4210000 17ffff94 d102c3ff b0010f28 f9478508
a9067bfd 910183fd f9003bf9
[   29.987265] 86b0  a9085ff8 a90957f6 a90a4ff4 f81f83a8 52801808
52800049 5280018a 72a02808
[   29.987275] m\x0aLR: 0xffffff95849f8650:
[   29.987276] 8650  2a1603e2 94012ce6 2a1f03f3 f94002e8 5282c009
8b090100 944e0032 2a1303e0
[   29.987285] 8670  a9434ff4 a94257f6 f9400bf7 a8c47bfd d65f03c0
9000c620 9106b400 97fcc349
[   29.987295] 8690  d4210000 17ffff94 d102c3ff b0010f28 f9478508
a9067bfd 910183fd f9003bf9
[   29.987304] 86b0  a9085ff8 a90957f6 a90a4ff4 f81f83a8 52801808
52800049 5280018a 72a02808
[   29.987314] m\x0aSP: 0xffffff8024893a50:
[   29.987315] 3a50  849f8690 ffffff95 60000145 00000000 ffffffc8
ffffff80 b8c00e00 7bee4e9a
[   29.987324] 3a70  ffffffff 0000007f 849f8690 ffffff95 24893a90
ffffff80 849f8690 ffffff95
[   29.987334] 3a90  24893b00 ffffff80 84adfba8 ffffff95 1b3f7b10
ffffffe6 1b3f7b00 ffffffe6
[   29.987343] 3ab0  00000000 00000000 1b8010c0 ffffffe6 ffd62000
00000007 ffda0000 00000007
[   29.987353]
[   29.987354] Call trace:
[   29.987357] isolate_lru_page+0x1e4/0x1ec
[   29.987359] reclaim_pte_range+0x144/0x230
[   29.987362] __walk_page_range+0x120/0x224
[   29.987365] walk_page_range+0x4c/0x128
[   29.987368] reclaim_write+0x298/0x3a8
[   29.987370] __vfs_write+0x44/0x134
[   29.987373] vfs_write+0xe0/0x19c
[   29.987375] SyS_write+0x6c/0xcc
[   29.987377] el0_svc_naked+0x34/0x38
[   29.987379] ---[ end trace 17986baba9f80715 ]---
[   29.987385] trying to isolate tail page
[   29.987391] ------------[ cut here ]------------

pixel devices also uses the same permissions for their per-process
reclaim
[*]18c2af05a5

Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Signed-off-by: pix106 <sbordenave@gmail.com>
Change-Id: I1d7138d343cf1f1230886bfd47efc3439f273b07
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:39 +05:30
Kees Cook
726c44b656 printk: Collapse shutdown types into a single dump reason
To turn the KMSG_DUMP_* reasons into a more ordered list, collapse
the redundant KMSG_DUMP_(RESTART|HALT|POWEROFF) reasons into
KMSG_DUMP_SHUTDOWN. The current users already don't meaningfully
distinguish between them, so there's no need to, as discussed here:
https://lore.kernel.org/lkml/CA+CK2bAPv5u1ih5y9t5FUnTyximtFCtDYXJCpuyjOyHNOkRdqw@mail.gmail.com/

Link: https://lore.kernel.org/lkml/20200515184434.8470-2-keescook@chromium.org/
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Ib5a364bec6185552381200088c74dc80138ead23
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:38 +05:30
Christoph Hellwig
e377714e64 fs: explicitly unregister per-superblock BDIs
Add a new SB_I_ flag to mark superblocks that have an ephemeral bdi
associated with them, and unregister it when the superblock is shut
down.

Bug: 182815710
Link: https://lkml.kernel.org/r/20211021124441.668816-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I2c3e8c2edf975b44a0a52a45c7564385180a5cd5
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:38 +05:30
liuchao12
d5f2bc59ce f2fs: enable fstrim to issue discard while using discard option
MIUI-1428085

The discard thread can only process 8 requests at a time by default.
So fstrim need to handle the remaining discard requests while using
discard option.

Change-Id: I5eac38c34182607e8dceeb13273522b10ce02af8
Signed-off-by: liuchao12 <liuchao12@xiaomi.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
xiongping1
c23d2bbe50 f2fs: add trim stop mechanism
MIUI-1428085

Change-Id: I7c910321b66c6877cbc5656b3b3e426557dc3314
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Shachar Raindel
db4693a8f8 f2fs: Fix deadlock between f2fs_quota_sync and block_operation
This deadlock is hitting Android users (Pixel 3/3a/4) with Magisk, due
to frequent umount/mount operations that trigger quota_sync, hitting
the race. See https://github.com/topjohnwu/Magisk/issues/3171 for
additional impact discussion.

In commit db6ec53b7e03, we added a semaphore to protect quota flags.
As part of this commit, we changed f2fs_quota_sync to call
f2fs_lock_op, in an attempt to prevent an AB/BA type deadlock with
quota_sem locking in block_operation.  However, rwsem in Linux is not
recursive. Therefore, the following deadlock can occur:

f2fs_quota_sync
down_read(cp_rwsem) // f2fs_lock_op
filemap_fdatawrite
f2fs_write_data_pages
...
                                   block_opertaion
				   down_write(cp_rwsem) - marks rwsem as
				                          "writer pending"
down_read_trylock(cp_rwsem) - fails as there is
                              a writer pending.
			      Code keeps on trying,
			      live-locking the filesystem.

We solve this by creating a new rwsem, used specifically to
synchronize this case, instead of attempting to reuse an existing
lock.

Signed-off-by: Shachar Raindel <shacharr@gmail.com>

Fixes: db6ec53b7e03 f2fs: add a rw_sem to cover quota flag changes
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Panchajanya1999
cdbe60a952 f2fs/sysfs: Apply RO macro on gc_urgent_sleep_time
Following commit c23401e6e15f73150f45e67287be679e4deb58f4,
we need to protect this node from Android writing to it.

Change-Id: I19ee51f06c9e373acf886d83026ade290645e243
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Panchajanya1999
067be2ed52 f2fs/sysfs: Introduce a Read-Only attribute macro
Useful when we need to set a node RO to avoid Android over-riding
the custom set values.

Change-Id: Iad8cf81504d55b8ed75e6b5563f7cf397595ec1a
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Panchajanya1999
0019df866f f2fs/gc: Reduce GC thread urgent sleep time to 50ms
Android sets the value to 50ms via vold's IdleMaint service. Since
500ms is too long for GC to colllect all invalid segments in time
which results in performance degradation.

On un-encrypted device, vold fails to set this value to 50ms thus
degrades the performance over time.

Based on [1].

[1] https://github.com/topjohnwu/Magisk/pull/5462
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
Change-Id: I80f2c29558393d726d5e696aaf285096c8108b23
Signed-off-by: Panchajanya1999 <rsk52959@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Park Ju Hyung
87edd7b03f f2fs: reduce timeout for uncongestion
On high fs utilization, congestion is hit quite frequently and waiting for a
whooping 20ms is too expensive, especially on critical paths.

Reduce it to an amount that is unlikely to affect UI rendering paths.

The new times are as follows:
  100 Hz  => 1 jiffy   (effective: 10 ms)
  250 Hz  => 2 jiffies (effective: 8 ms)
  300 Hz  => 2 jiffies (effective: 6 ms)
  1000 Hz => 6 jiffies (effective: 6 ms)

Co-authored-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Park Ju Hyung
c5cb160678 f2fs: set ioprio of GC kthread to idle
GC should run conservatively as possible to reduce latency spikes to the user.

Setting ioprio to idle class will allow the kernel to schedule GC thread's I/O
to not affect any other processes' I/O requests.

Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
Jesse Chan
67f363d73c f2fs: Enlarge min_fsync_blocks to 20
In OPPO's kernel:
enlarge min_fsync_blocks to optimize performance
  - yanwu@TECH.Storage.FS.oF2FS, 2019/08/12

Huawei is also doing this in their production kernel.

If this optimization is good for them and shipped
with their devices, it should be good for us.

Signed-off-by: Jesse Chan <jc@linux.com>
Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:36 +05:30
arter97
0a43b38ed0 fs: default to noatime
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:35 +05:30
Alexander Winkowski
c4ef4c5773 fscrypt: Fix misleading indentation warning
Change-Id: Ia5c0f6d48fda9ced33291366868d9aa4102d46e5
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:29 +05:30
Juhyung Park
4f36422380 pstore: spoof dmesg-ramoops-0 as console-ramoops-0
Android wants console-ramoops-0

Change-Id: Iaed32518401d1e92d2e1deb4c969e8ec4857e98a
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2026-01-04 11:55:29 +05:30
Kees Cook
de6c665400 pstore: Select compression at runtime
To allow for easier build test coverage and run-time testing, this allows
multiple compression algorithms to be built into pstore. Still only one
is supported to operate at a time (which can be selected at build time
or at boot time, similar to how LSMs are selected).

Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I5956061c215db5d3d7846b11b399ab101feaceb9
[dereference23: Backport to 4.14]
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Kees Cook
5aa7564c20 pstore/ram: Introduce max_reason and convert dump_oops
Now that pstore_register() can correctly pass max_reason to the kmesg
dump facility, introduce a new "max_reason" module parameter and
"max-reason" Device Tree field.

The "dump_oops" module parameter and "dump-oops" Device
Tree field are now considered deprecated, but are now automatically
converted to their corresponding max_reason values when present, though
the new max_reason setting has precedence.

For struct ramoops_platform_data, the "dump_oops" member is entirely
replaced by a new "max_reason" member, with the only existing user
updated in place.

Additionally remove the "reason" filter logic from ramoops_pstore_write(),
as that is not specifically needed anymore, though technically
this is a change in behavior for any ramoops users also setting the
printk.always_kmsg_dump boot param, which will cause ramoops to behave as
if max_reason was set to KMSG_DUMP_MAX.

Co-developed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/lkml/20200515184434.8470-6-keescook@chromium.org/
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: If2ed5c5786a9c572aa1eb4683eca1a0b292bb143
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Pavel Tatashin
28f3310b7c pstore/platform: Pass max_reason to kmesg dump
Add a new member to struct pstore_info for passing information about
kmesg dump maximum reason. This allows a finer control of what kmesg
dumps are sent to pstore storage backends.

Those backends that do not explicitly set this field (keeping it equal to
0), get the default behavior: store only Oopses and Panics, or everything
if the printk.always_kmsg_dump boot param is set.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/lkml/20200515184434.8470-5-keescook@chromium.org/
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I6bdb11d3b3d74624b4f7b3b3da5811bb9ef23608
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Kees Cook
9318dbf85a pstore/ram: Refactor DT size parsing
Refactor device tree size parsing routines to be able to pass a non-zero
default value for providing a configurable default for the coming
"max_reason" field. Also rename the helpers, since we're not always
parsing a size -- we're parsing a u32 and making sure it's not greater
than INT_MAX.

Link: https://lore.kernel.org/lkml/20200506211523.15077-4-keescook@chromium.org/
Link: https://lore.kernel.org/lkml/20200521205223.175957-1-tyhicks@linux.microsoft.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Id8a97bd6375ea42cc97ed52b9af1b3c39b845b7b
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Kees Cook
6e78e7cee5 pstore/ram: Adjust module param permissions to reflect reality
A couple module parameters had 0600 permissions, but changing them would
have no impact on ramoops, so switch these to 0400 to reflect reality.

Link: https://lore.kernel.org/lkml/20200506211523.15077-7-keescook@chromium.org/
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I6c8bd04b54033825042f55829817589a78671d61
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Kees Cook
45c7e51932 pstore/ram: Avoid needless alloc during header write
Since the header is a fixed small maximum size, just use a stack variable
to avoid memory allocation in the write path.

Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I97974d792d079775d1e17dd47fa2135db99a69b2
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Yue Hu
ca6593a439 pstore/ram: Add kmsg hlen zero check to ramoops_pstore_write()
If zero-length header happened in ramoops_write_kmsg_hdr(), that means
we will not be able to read back dmesg record later, since it will be
treated as invalid header in ramoops_pstore_read(). So we should not
execute the following code but return the error.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Ic14e781085ef0359d21204d4b4ef9902b801e457
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Yue Hu
777ab4eb50 pstore/ram: Move initialization earlier
Since only one single ramoops area allowed at a time, other probes
(like device tree) are meaningless, as it will waste CPU resources.
So let's check for being already initialized first.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I81905fc676487984e7ee18f6ea788100d1b60675
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Yue Hu
2741af3d0e pstore: Avoid writing records with zero size
Sometimes pstore_console_write() will write records with zero size
to persistent ram zone, which is unnecessary. It will only increase
resource consumption. Also adjust ramoops_write_kmsg_hdr() to have
same logic if memory allocation fails.

Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: Ibf780aa7c1446c2a8c8520ba345621177d160383
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
2026-01-04 11:55:29 +05:30
Dave Kleikamp
9d006e1f02 AIO: Don't plug the I/O queue in do_io_submit()
Asynchronous I/O latency to a solid-state disk greatly increased between the 2.6.32 and 3.0 kernels.
By removing the plug from do_io_submit(), we observed a 34% improvement in the I/O latency.
Unfortunately, at this level, we don't know if the request is to
a rotating disk or not.

Change-Id: I7101df956473ed9fd5dcff18e473dd93b688a5c1
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: linux-aio@kvack.org
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
2026-01-04 11:55:27 +05:30
John Dias
33dbfcc0a1 fs: Improve eventpoll logging to stop indicting timerfd
timerfd doesn't create any wakelocks; eventpoll can, and is creating the
wakelocks we see called "[timerfd]".  eventpoll creates two kinds of
wakelocks: a single top-level lock associated with the eventpoll fd
itself, and one additional lock for each fd it is polling that needs such
a lock (e.g. those using EPOLLWAKEUP).  Current code names the per-fd
locks using the undecorated names of the fds' associated files (hence
"[timerfd]"), and is naming the top-level lock after the PID of the caller
and the name of the file behind the first fd for which a per-fd lock is
created.  To make things clearer, the top-level lock is now named using
the caller PID and an "epollfd" designation, while the per-fd locks are
also named with the caller's PID (to associate them with the top-level
lock) and their respective fds' file names.

Port of fix already applied to previous 2 generations.  Note that this
set of changes does not fully solve the problem of eventpoll/timerfd
wakelock attribution to the original process, since most activity is
relayed through system_server, but it does at least ensure that different
eventpoll wakelocks - and their stats - are properly disambiguated.

Test: Ran on device and observed new wakelock naming in
/d/wakeup_sources and (file naming in) lsof output.
Bug: 116363986
Change-Id: I34bada5ddab04cf3830762c745f46bfcd1549cb8
Signed-off-by: John Dias <joaodias@google.com>
Signed-off-by: Kelly Rossmoyer <krossmo@google.com>
Signed-off-by: Miguel de Dios <migueldedios@google.com>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2026-01-04 11:55:27 +05:30