Commit Graph

56306 Commits

Author SHA1 Message Date
Pzqqt
2bc524a19b Revert "f2fs: avoid to check PG_error flag"
[Suggestion from Tashar02](375754065c (commitcomment-114679849))

This reverts commit 375754065cdb21304bec51240d2fcb03246d4c79.
2023-08-09 18:23:15 -05:00
Qi Han
5fee669d91 f2fs: remove unnessary comment in __may_age_extent_tree
This comment make no sense and is in the wrong place, so let's
remove it.

Signed-off-by: Qi Han <hanqi@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:10 -05:00
Daeho Jeong
beb9332138 f2fs: allocate node blocks for atomic write block replacement
When a node block is missing for atomic write block replacement, we need
to allocate it in advance of the replacement.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:09 -05:00
Daeho Jeong
3c442f1616 f2fs: use cow inode data when updating atomic write
Need to use cow inode data content instead of the one in the original
inode, when we try to write the already updated atomic write files.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:09 -05:00
Jaegeuk Kim
b3f32248a8 f2fs: remove power-of-two limitation of zoned device
In f2fs, there's no reason to force po2.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:08 -05:00
Yangtao Li
1752c293b7 f2fs: add has_enough_free_secs()
Replace !has_not_enough_free_secs w/ has_enough_free_secs.
BTW avoid nested 'if' statements in f2fs_balance_fs().

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:08 -05:00
Jaegeuk Kim
2cf573a2b0 f2fs: relax sanity check if checkpoint is corrupted
1. extent_cache
 - let's drop the largest extent_cache
2. invalidate_block
 - don't show the warnings

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:08 -05:00
Jaegeuk Kim
dca498eadf f2fs: refactor f2fs_gc to call checkpoint in urgent condition
The major change is to call checkpoint, if there's not enough space while having
some prefree segments in FG_GC case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:07 -05:00
Chao Yu
ca89efd9e7 f2fs: fix to call clear_page_private_reference in .{release,invalid}_folio
b763f3bedc2d ("f2fs: restructure f2fs page.private layout") missed
to call clear_page_private_reference() in .{release,invalid}_folio,
fix it, though it's not a big deal since folio_detach_private() was
called to clear all privae info and reference count in the page.

BTW, remove page_private_reference() definition as it never be used.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:07 -05:00
Yangtao Li
a102d61ee7 f2fs: remove bulk remove_proc_entry() and unnecessary kobject_del()
Convert to use remove_proc_subtree() and kill kobject_del() directly.
kobject_put() actually covers kobject removal automatically, which is
single stage removal.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:06 -05:00
Chao Yu
eddeead846 f2fs: fix to check return value of inc_valid_block_count()
In __replace_atomic_write_block(), we missed to check return value
of inc_valid_block_count(), for extreme testcase that f2fs image is
run out of space, it may cause inconsistent status in between SIT
table and total valid block count.

Cc: Daeho Jeong <daehojeong@google.com>
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:06 -05:00
Chao Yu
d8b52b6586 f2fs: fix to check return value of f2fs_do_truncate_blocks()
Otherwise, if truncation on cow_inode failed, remained data may
pollute current transaction of atomic write.

Cc: Daeho Jeong <daehojeong@google.com>
Fixes: a46bebd502fe ("f2fs: synchronize atomic write aborts")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:05 -05:00
Jaegeuk Kim
34f6b35bfd f2fs: fix potential corruption when moving a directory
F2FS has the same issue in ext4_rename causing crash revealed by
xfstests/generic/707.

See also commit 0813299c586b ("ext4: Fix possible corruption when moving a directory")

CC: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:05 -05:00
Yohan Joung
f314e24e22 f2fs: add radix_tree_preload_end in error case
To prevent excessive increase in preemption count
add radix_tree_preload_end in retry

Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:04 -05:00
Chao Yu
6bf3f3c937 f2fs: fix to recover quota data correctly
With -O quota mkfs option, xfstests generic/417 fails due to fsck detects
data corruption on quota inodes.

[ASSERT] (fsck_chk_quota_files:2051)  --> Quota file is missing or invalid quota file content found.

The root cause is there is a hole f2fs doesn't hold quota inodes,
so all recovered quota data will be dropped due to SBI_POR_DOING
flag was set.
- f2fs_fill_super
 - f2fs_recover_orphan_inodes
  - f2fs_enable_quota_files
  - f2fs_quota_off_umount
<--- quota inodes were dropped --->
 - f2fs_recover_fsync_data
  - f2fs_enable_quota_files
  - f2fs_quota_off_umount

This patch tries to eliminate the hole by holding quota inodes
during entire recovery flow as below:
- f2fs_fill_super
 - f2fs_recover_quota_begin
 - f2fs_recover_orphan_inodes
 - f2fs_recover_fsync_data
 - f2fs_recover_quota_end

Then, recovered quota data can be persisted after SBI_POR_DOING
is cleared.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:04 -05:00
Chao Yu
cc18bbd7c3 f2fs: fix to check readonly condition correctly
With below case, it can mount multi-device image w/ rw option, however
one of secondary device is set as ro, later update will cause panic, so
let's introduce f2fs_dev_is_readonly(), and check multi-devices rw status
in f2fs_remount() w/ it in order to avoid such inconsistent mount status.

mkfs.f2fs -c /dev/zram1 /dev/zram0 -f
blockdev --setro /dev/zram1
mount -t f2fs dev/zram0 /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
mount -t f2fs -o remount,rw mnt/f2fs
dd if=/dev/zero  of=/mnt/f2fs/file bs=1M count=8192

kernel BUG at fs/f2fs/inline.c:258!
RIP: 0010:f2fs_write_inline_data+0x23e/0x2d0 [f2fs]
Call Trace:
  f2fs_write_single_data_page+0x26b/0x9f0 [f2fs]
  f2fs_write_cache_pages+0x389/0xa60 [f2fs]
  __f2fs_write_data_pages+0x26b/0x2d0 [f2fs]
  f2fs_write_data_pages+0x2e/0x40 [f2fs]
  do_writepages+0xd3/0x1b0
  __writeback_single_inode+0x5b/0x420
  writeback_sb_inodes+0x236/0x5a0
  __writeback_inodes_wb+0x56/0xf0
  wb_writeback+0x2a3/0x490
  wb_do_writeback+0x2b2/0x330
  wb_workfn+0x6a/0x260
  process_one_work+0x270/0x5e0
  worker_thread+0x52/0x3e0
  kthread+0xf4/0x120
  ret_from_fork+0x29/0x50

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:04 -05:00
Chao Yu
302b6e80e6 f2fs: fix to keep consistent i_gc_rwsem lock order
i_gc_rwsem[WRITE] and i_gc_rwsem[READ] lock order is reversed
in gc_data_segment() and f2fs_dio_write_iter(), fix to keep
consistent lock order as below:
1. lock i_gc_rwsem[WRITE]
2. lock i_gc_rwsem[READ]

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:03 -05:00
Chao Yu
935e8fcfbc f2fs: fix to drop all dirty pages during umount() if cp_error is set
xfstest generic/361 reports a bug as below:

f2fs_bug_on(sbi, sbi->fsync_node_num);

kernel BUG at fs/f2fs/super.c:1627!
RIP: 0010:f2fs_put_super+0x3a8/0x3b0
Call Trace:
 generic_shutdown_super+0x8c/0x1b0
 kill_block_super+0x2b/0x60
 kill_f2fs_super+0x87/0x110
 deactivate_locked_super+0x39/0x80
 deactivate_super+0x46/0x50
 cleanup_mnt+0x109/0x170
 __cleanup_mnt+0x16/0x20
 task_work_run+0x65/0xa0
 exit_to_user_mode_prepare+0x175/0x190
 syscall_exit_to_user_mode+0x25/0x50
 do_syscall_64+0x4c/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc

During umount(), if cp_error is set, f2fs_wait_on_all_pages() should
not stop waiting all F2FS_WB_CP_DATA pages to be writebacked, otherwise,
fsync_node_num can be non-zero after f2fs_wait_on_all_pages() causing
this bug.

In this case, to avoid deadloop in f2fs_wait_on_all_pages(), it needs
to drop all dirty pages rather than redirtying them.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:02 -05:00
Chao Yu
10e94f5868 f2fs: fix to avoid use-after-free for cached IPU bio
xfstest generic/019 reports a bug:

kernel BUG at mm/filemap.c:1619!
RIP: 0010:folio_end_writeback+0x8a/0x90
Call Trace:
 end_page_writeback+0x1c/0x60
 f2fs_write_end_io+0x199/0x420
 bio_endio+0x104/0x180
 submit_bio_noacct+0xa5/0x510
 submit_bio+0x48/0x80
 f2fs_submit_write_bio+0x35/0x300
 f2fs_submit_merged_ipu_write+0x2a0/0x2b0
 f2fs_write_single_data_page+0x838/0x8b0
 f2fs_write_cache_pages+0x379/0xa30
 f2fs_write_data_pages+0x30c/0x340
 do_writepages+0xd8/0x1b0
 __writeback_single_inode+0x44/0x370
 writeback_sb_inodes+0x233/0x4d0
 __writeback_inodes_wb+0x56/0xf0
 wb_writeback+0x1dd/0x2d0
 wb_workfn+0x367/0x4a0
 process_one_work+0x21d/0x430
 worker_thread+0x4e/0x3c0
 kthread+0x103/0x130
 ret_from_fork+0x2c/0x50

The root cause is: after cp_error is set, f2fs_submit_merged_ipu_write()
in f2fs_write_single_data_page() tries to flush IPU bio in cache, however
f2fs_submit_merged_ipu_write() missed to check validity of @bio parameter,
result in submitting random cached bio which belong to other IO context,
then it will cause use-after-free issue, fix it by adding additional
validity check.

Fixes: 0b20fcec8651 ("f2fs: cache global IPU bio")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:02 -05:00
Chao Yu
82df0ca4ab f2fs: remove unneeded in-memory i_crtime copy
i_crtime will never change after inode creation, so we don't need
to copy it into f2fs_inode_info.i_disk_time[3], and monitor its
change to decide whether updating inode page, remove related stuff.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:02 -05:00
Chao Yu
e3ae2a300c f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only()
f2fs has supported multi-device feature, to check devices' rw status,
it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix
it.

Meanwhile, it removes f2fs_hw_is_readonly() check condition in:
- f2fs_write_checkpoint()
- f2fs_convert_inline_inode()
As it has checked f2fs_readonly() condition, and if f2fs' devices
were readonly, f2fs_readonly() must be true.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:01 -05:00
Yangtao Li
c779454fdc f2fs: merge lz4hc_compress_pages() to lz4_compress_pages()
Remove unnecessary lz4hc_compress_pages().

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:01 -05:00
Yangtao Li
51434d9c7c f2fs: convert to use sysfs_emit
Let's use sysfs_emit.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:00 -05:00
Yangtao Li
a347049b54 f2fs: set default compress option only when sb_has_compression
If the compress feature is not enabled, there is no need to set
compress-related parameters.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:00 -05:00
Yonggil Song
91ea12e172 f2fs: Fix system crash due to lack of free space in LFS
When f2fs tries to checkpoint during foreground gc in LFS mode, system
crash occurs due to lack of free space if the amount of dirty node and
dentry pages generated by data migration exceeds free space.
The reproduction sequence is as follows.

 - 20GiB capacity block device (null_blk)
 - format and mount with LFS mode
 - create a file and write 20,000MiB
 - 4k random write on full range of the file

 RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
 Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
 RSP: 0018:ffff977bc397b218 EFLAGS: 00010246
 RAX: 00000000000027b9 RBX: 0000000000000000 RCX: 00000000000027c0
 RDX: 0000000000000000 RSI: 00000000000027b9 RDI: ffff8c25ab4e74f8
 RBP: ffff977bc397b268 R08: 00000000000027b9 R09: ffff8c29e4a34b40
 R10: 0000000000000001 R11: ffff977bc397b0d8 R12: 0000000000000000
 R13: ffff8c25b4dd81a0 R14: 0000000000000000 R15: ffff8c2f667f9000
 FS: 0000000000000000(0000) GS:ffff8c344ec80000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000c00055d000 CR3: 0000000e30810003 CR4: 00000000003706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
 <TASK>
 allocate_segment_by_default+0x9c/0x110 [f2fs]
 f2fs_allocate_data_block+0x243/0xa30 [f2fs]
 ? __mod_lruvec_page_state+0xa0/0x150
 do_write_page+0x80/0x160 [f2fs]
 f2fs_do_write_node_page+0x32/0x50 [f2fs]
 __write_node_page+0x339/0x730 [f2fs]
 f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
 block_operations+0x257/0x340 [f2fs]
 f2fs_write_checkpoint+0x102/0x1050 [f2fs]
 f2fs_gc+0x27c/0x630 [f2fs]
 ? folio_mark_dirty+0x36/0x70
 f2fs_balance_fs+0x16f/0x180 [f2fs]

This patch adds checking whether free sections are enough before checkpoint
during gc.

Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
[Jaegeuk Kim: code clean-up]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:17:00 -05:00
Yangtao Li
031c3ecb0e f2fs: remove struct victim_selection default_v_ops
There is only single instance of these ops, and Jaegeuk point out that:

    Originally this was intended to give a chance to provide other
    allocation option. Anyway, it seems quit hard to do it anymore.

So remove the indirection and call f2fs_get_victim() directly.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:59 -05:00
Qilin Tan
e5539fa090 f2fs: fix iostat lock protection
Made iostat lock irq safe to avoid potentinal deadlock.

Deadlock scenario:
f2fs_attr_store
  -> f2fs_sbi_store
  -> _sbi_store
  -> spin_lock(sbi->iostat_lock)
    <interrupt request>
    -> scsi_end_request
    -> bio_endio
    -> f2fs_dio_read_end_io
    -> f2fs_update_iostat
    -> spin_lock_irqsave(sbi->iostat_lock)  ===> Dead lock here

Fixes: 61803e984307 ("f2fs: fix iostat related lock protection")
Fixes: a1e09b03e6f5 ("f2fs: use iomap for direct I/O")
Signed-off-by: Qilin Tan <qilin.tan@mediatek.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:59 -05:00
Yohan Joung
2c164d0f51 f2fs: fix align check for npo2
Fix alignment check to be correct in npo2 as well

Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:58 -05:00
Yangtao Li
ea2f5ef48b f2fs: add compression feature check for all compress mount opt
Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache
lack the necessary check to see if the image supports compression,
let's add it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:58 -05:00
Yangtao Li
20cf46fa76 f2fs: convert is_extension_exist() to return bool type
is_extension_exist() only return two values, 0 or 1.
So there is no need to use int type.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:57 -05:00
Jaegeuk Kim
bccf0eed44 f2fs: fix scheduling while atomic in decompression path
[   16.945668][    C0] Call trace:
[   16.945678][    C0]  dump_backtrace+0x110/0x204
[   16.945706][    C0]  dump_stack_lvl+0x84/0xbc
[   16.945735][    C0]  __schedule_bug+0xb8/0x1ac
[   16.945756][    C0]  __schedule+0x724/0xbdc
[   16.945778][    C0]  schedule+0x154/0x258
[   16.945793][    C0]  bit_wait_io+0x48/0xa4
[   16.945808][    C0]  out_of_line_wait_on_bit+0x114/0x198
[   16.945824][    C0]  __sync_dirty_buffer+0x1f8/0x2e8
[   16.945853][    C0]  __f2fs_commit_super+0x140/0x1f4
[   16.945881][    C0]  f2fs_commit_super+0x110/0x28c
[   16.945898][    C0]  f2fs_handle_error+0x1f4/0x2f4
[   16.945917][    C0]  f2fs_decompress_cluster+0xc4/0x450
[   16.945942][    C0]  f2fs_end_read_compressed_page+0xc0/0xfc
[   16.945959][    C0]  f2fs_handle_step_decompress+0x118/0x1cc
[   16.945978][    C0]  f2fs_read_end_io+0x168/0x2b0
[   16.945993][    C0]  bio_endio+0x25c/0x2c8
[   16.946015][    C0]  dm_io_dec_pending+0x3e8/0x57c
[   16.946052][    C0]  clone_endio+0x134/0x254
[   16.946069][    C0]  bio_endio+0x25c/0x2c8
[   16.946084][    C0]  blk_update_request+0x1d4/0x478
[   16.946103][    C0]  scsi_end_request+0x38/0x4cc
[   16.946129][    C0]  scsi_io_completion+0x94/0x184
[   16.946147][    C0]  scsi_finish_command+0xe8/0x154
[   16.946164][    C0]  scsi_complete+0x90/0x1d8
[   16.946181][    C0]  blk_done_softirq+0xa4/0x11c
[   16.946198][    C0]  _stext+0x184/0x614
[   16.946214][    C0]  __irq_exit_rcu+0x78/0x144
[   16.946234][    C0]  handle_domain_irq+0xd4/0x154
[   16.946260][    C0]  gic_handle_irq.33881+0x5c/0x27c
[   16.946281][    C0]  call_on_irq_stack+0x40/0x70
[   16.946298][    C0]  do_interrupt_handler+0x48/0xa4
[   16.946313][    C0]  el1_interrupt+0x38/0x68
[   16.946346][    C0]  el1h_64_irq_handler+0x20/0x30
[   16.946362][    C0]  el1h_64_irq+0x78/0x7c
[   16.946377][    C0]  finish_task_switch+0xc8/0x3d8
[   16.946394][    C0]  __schedule+0x600/0xbdc
[   16.946408][    C0]  preempt_schedule_common+0x34/0x5c
[   16.946423][    C0]  preempt_schedule+0x44/0x48
[   16.946438][    C0]  process_one_work+0x30c/0x550
[   16.946456][    C0]  worker_thread+0x414/0x8bc
[   16.946472][    C0]  kthread+0x16c/0x1e0
[   16.946486][    C0]  ret_from_fork+0x10/0x20

Fixes: bff139b49d9f ("f2fs: handle decompress only post processing in softirq")
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:57 -05:00
Yangtao Li
aa5c13b1c8 f2fs: compress: fix to call f2fs_wait_on_page_writeback() in f2fs_write_raw_pages()
BUG_ON() will be triggered when writing files concurrently,
because the same page is writtenback multiple times.

1597 void folio_end_writeback(struct folio *folio)
1598 {
		......
1618     if (!__folio_end_writeback(folio))
1619         BUG();
		......
1625 }

kernel BUG at mm/filemap.c:1619!
Call Trace:
 <TASK>
 f2fs_write_end_io+0x1a0/0x370
 blk_update_request+0x6c/0x410
 blk_mq_end_request+0x15/0x130
 blk_complete_reqs+0x3c/0x50
 __do_softirq+0xb8/0x29b
 ? sort_range+0x20/0x20
 run_ksoftirqd+0x19/0x20
 smpboot_thread_fn+0x10b/0x1d0
 kthread+0xde/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x22/0x30
 </TASK>

Below is the concurrency scenario:

[Process A]		[Process B]		[Process C]
f2fs_write_raw_pages()
  - redirty_page_for_writepage()
  - unlock page()
			f2fs_do_write_data_page()
			  - lock_page()
			  - clear_page_dirty_for_io()
			  - set_page_writeback() [1st writeback]
			    .....
			    - unlock page()

						generic_perform_write()
						  - f2fs_write_begin()
						    - wait_for_stable_page()

						  - f2fs_write_end()
						    - set_page_dirty()

  - lock_page()
    - f2fs_do_write_data_page()
      - set_page_writeback() [2st writeback]

This problem was introduced by the previous commit 7377e853967b ("f2fs:
compress: fix potential deadlock of compress file"). All pagelocks were
released in f2fs_write_raw_pages(), but whether the page was
in the writeback state was ignored in the subsequent writing process.
Let's fix it by waiting for the page to writeback before writing.

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Fixes: 7377e853967b ("f2fs: compress: fix potential deadlock of compress file")
Signed-off-by: Qi Han <hanqi@vivo.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:57 -05:00
Yangtao Li
349f6235a6 f2fs: remove else in f2fs_write_cache_pages()
As Christoph Hellwig point out:

	Please avoid the else by doing the goto in the branch.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:56 -05:00
Jaegeuk Kim
738acd196e f2fs: apply zone capacity to all zone type
If we manage the zone capacity per zone type, it'll break the GC assumption.
And, the current logic complains valid block count mismatch.
Let's apply zone capacity to all zone type, if specified.

Fixes: de881df97768 ("f2fs: support zone capacity less than zone size")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:56 -05:00
Yangtao Li
584afd5e5b f2fs: fix to handle filemap_fdatawrite() error in f2fs_ioc_decompress_file/f2fs_ioc_compress_file
It seems inappropriate that the current logic does not handle
filemap_fdatawrite() errors, so let's fix it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:56 -05:00
Yangtao Li
cac2b44089 f2fs: convert to MAX_SBI_FLAG instead of 32 in stat_show()
BIW reduce the s_flag array size and make s_flag constant.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:55 -05:00
Yonggil Song
0c32655946 f2fs: Fix discard bug on zoned block devices with 2MiB zone size
When using f2fs on a zoned block device with 2MiB zone size, IO errors
occurs because f2fs tries to write data to a zone that has not been reset.

The cause is that f2fs tries to discard multiple zones at once. This is
caused by a condition in f2fs_clear_prefree_segments that does not check
for zoned block devices when setting the discard range. This leads to
invalid reset commands and write pointer mismatches.

This patch fixes the zoned block device with 2MiB zone size to reset one
zone at a time.

Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:55 -05:00
Jaegeuk Kim
1172976688 f2fs: remove entire rb_entry sharing
This is a last part to remove the memory sharing for rb_tree in extent_cache.

This should also fix arm32 memory alignment issue.

[struct extent_node]               [struct rb_entry]
[0] struct rb_node rb_node;        [0] struct rb_node rb_node;
  union {                              union {
    struct {                             struct {
[16]  unsigned int fofs;           [12]    unsigned int ofs;
      unsigned int len;                    unsigned int len;
                                         };
                                         unsigned long long key;
                                       } __packed;

Cc: <stable@vger.kernel.org>
Fixes: 13054c548a ("f2fs: introduce infra macro and data structure of rb-tree extent cache")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:54 -05:00
Jaegeuk Kim
79486233a3 f2fs: factor out discard_cmd usage from general rb_tree use
This is a second part to remove the mixed use of rb_tree in discard_cmd from
extent_cache.

This should also fix arm32 memory alignment issue caused by shared rb_entry.

[struct discard_cmd]               [struct rb_entry]
[0] struct rb_node rb_node;        [0] struct rb_node rb_node;
  union {                              union {
    struct {                             struct {
[16]  block_t lstart;              [12]    unsigned int ofs;
      block_t len;                         unsigned int len;
                                         };
                                         unsigned long long key;
                                       } __packed;

Cc: <stable@vger.kernel.org>
Fixes: 004b686218 ("f2fs: use rb-tree to track pending discard commands")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:54 -05:00
Jaegeuk Kim
7d7d68b2d0 f2fs: factor out victim_entry usage from general rb_tree use
Let's reduce the complexity of mixed use of rb_tree in victim_entry from
extent_cache and discard_cmd.

This should fix arm32 memory alignment issue caused by shared rb_entry.

[struct victim_entry]              [struct rb_entry]
[0] struct rb_node rb_node;        [0] struct rb_node rb_node;
                                       union {
                                         struct {
                                           unsigned int ofs;
                                           unsigned int len;
                                         };
[16] unsigned long long mtime;     [12] unsigned long long key;
                                       } __packed;

Cc: <stable@vger.kernel.org>
Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:54 -05:00
Yonggil Song
6f5de15db5 f2fs: fix uninitialized skipped_gc_rwsem
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: 6f8d4455060d ("f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc")
Signed-off-by: Yonggil Song <yonggil.song@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:53 -05:00
Yangtao Li
14b0f2d5a4 f2fs: handle dqget error in f2fs_transfer_project_quota()
We should set the error code when dqget() failed.

Fixes: 2c1d030569 ("f2fs: support F2FS_IOC_FS{GET,SET}XATTR")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:53 -05:00
Yangtao Li
79b458b514 f2fs: convert to use bitmap API
Let's use BIT() and GENMASK() instead of open it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:52 -05:00
Yangtao Li
97f60e64ea f2fs: export compress_percent and compress_watermark entries
This patch export below sysfs entries for better control cached
compress page count.

/sys/fs/f2fs/<disk>/compress_watermark
/sys/fs/f2fs/<disk>/compress_percent

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:52 -05:00
Li Zetao
fd36358218 f2fs: make f2fs_sync_inode_meta() static
After commit 26b5a079197c ("f2fs: cleanup dirty pages if recover failed"),
f2fs_sync_inode_meta() is only used in checkpoint.c, so
f2fs_sync_inode_meta() should only be visible inside. Delete the
declaration in the header file and change f2fs_sync_inode_meta()
to static.

Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:45 -05:00
Jaegeuk Kim
5c3bb78d62 f2fs: fix null pointer panic in tracepoint in __replace_atomic_write_block
We got a kernel panic if old_addr is NULL.

https://bugzilla.kernel.org/show_bug.cgi?id=217266

BUG: kernel NULL pointer dereference, address: 0000000000000000
 Call Trace:
  <TASK>
  f2fs_commit_atomic_write+0x619/0x990 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43]
  __f2fs_ioctl+0xd8e/0x4080 [f2fs a1b985b80f5babd6f3ea778384908880812bfa43]
  ? vfs_write+0x2ae/0x3f0
  ? vfs_write+0x2ae/0x3f0
  __x64_sys_ioctl+0x91/0xd0
  do_syscall_64+0x5c/0x90
  entry_SYSCALL_64_after_hwframe+0x72/0xdc
 RIP: 0033:0x7f69095fe53f

Fixes: 2f3a9ae990a7 ("f2fs: introduce trace_f2fs_replace_atomic_write_block")
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-09 18:16:40 -05:00
Eric Biggers
7a1e357810 BACKPORT: fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity.  It was
intended to solve a race condition where unverified pages might be left
in the pagecache.  But actually it doesn't solve it fully.

Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.

Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>

Bug: 273320626
(cherry picked from commit a075bacde257f755bea0e53400c9f1cdd1b8e8e6)
Change-Id: I28dacf122bba5ac816f9b748dcbaa82dc1072fed
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-08-09 18:16:29 -05:00
Nathan Huckleberry
75ce93d229 UPSTREAM: fsverity: Remove WQ_UNBOUND from fsverity read workqueue
WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
is problematic for latency sensitive workloads, like I/O
post-processing.

Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.
WQ_UNBOUND was also removed from the dm-verity workqueue for the same
reason [1].

This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the runnable state.

Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us

[1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>

Bug: 258554362
(cherry picked from commit f959325e6ac3f499450088b8d9c626d1177be160)
Change-Id: I13f74e0df913894938969582604947e8a1fc51a3
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-08-09 18:16:25 -05:00
Alessio Balsini
657a9cb940 BACKPORT: ANDROID: fs/fuse: Keep FUSE file times consistent with lower file
When FUSE passthrough is used, the lower file system file is manipulated
directly, but neither mtime, atime or ctime of the referencing FUSE file
is updated.

Fix by updating the file times when passthrough operations are
performed.

Bug: 200779468
Reported-by: Fengnan Chang <changfengnan@vivo.com>
Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I35b72196b2cc1d79a9f62ddb32e2cfa934c3b6d3
[cyberknight777: backport to 4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: onettboots <blackcocopet@gmail.com>
2023-08-09 18:16:10 -05:00
Biao Li
90bd14623b BACKPORT: ANDROID: fuse: Allocate zeroed memory for canonical path
The page used to contain the fuse_dentry_canonical_path to be handled in
fuse_dev_do_write is allocated using __get_free_pages(GFP_KERNEL).
The returned page may contain undefined data, that by chance may be
considered as a valid path name that is not in the cache. In that case,
if the FUSE daemon mistakenly doesn't fill the canonical path buffer,
the FUSE driver may fall into two blocking

  request_wait_answer(fuse_dev_write->kern_path->fuse_lookup_name)

causing a deadlock condition.

The stack is as follows:
find            S    0 20511  20117 0x00000000
Call trace:
[<ffffff8008085e78>] __switch_to+0xb8/0xd4
[<ffffff8008a0cac4>] __schedule+0x458/0x714
[<ffffff8008a0ce0c>] schedule+0x8c/0xa8
[<ffffff800833865c>] request_wait_answer+0x74/0x220
[<ffffff8008339f70>] __fuse_request_send+0x8c/0xa0
[<ffffff8008339fe4>] fuse_request_send+0x60/0x6c
[<ffffff800833c1a8>] fuse_dentry_canonical_path+0xb8/0x104
[<ffffff800820b14c>] do_sys_open+0x1b4/0x260
[<ffffff800820b27c>] SyS_openat+0x3c/0x4c
[<ffffff8008083540>] el0_svc_naked+0x34/0x38
mount.ntfs-3g   S    0  5845      1 0x00000000
Call trace:
[<ffffff8008085e78>] __switch_to+0xb8/0xd4
[<ffffff8008a0cac4>] __schedule+0x458/0x714
[<ffffff8008a0ce0c>] schedule+0x8c/0xa8
[<ffffff800833865c>] request_wait_answer+0x74/0x220
[<ffffff8008339f70>] __fuse_request_send+0x8c/0xa0
[<ffffff8008339fe4>] fuse_request_send+0x60/0x6c
[<ffffff800833bdb0>] fuse_simple_request+0x128/0x16c
[<ffffff800833dddc>] fuse_lookup_name+0x104/0x1b0
[<ffffff800833dee4>] fuse_lookup+0x5c/0x11c
[<ffffff800821861c>] lookup_slow+0xfc/0x174
[<ffffff800821b474>] walk_component+0xf0/0x290
[<ffffff800821bbac>] path_lookupat+0xa0/0x128
[<ffffff800821c7f4>] filename_lookup+0x84/0x124
[<ffffff800821c8d8>] kern_path+0x44/0x54
[<ffffff800833b0c8>] fuse_dev_do_write+0x828/0xa0c
[<ffffff800833b610>] fuse_dev_write+0x90/0xb4
[<ffffff800820b770>] do_iter_readv_writev+0xf4/0x13c
[<ffffff800820cc88>] do_readv_writev+0xec/0x220
[<ffffff800820d05c>] vfs_writev+0x60/0x74
[<ffffff800820d0ec>] do_writev+0x7c/0x100
[<ffffff800820e348>] SyS_writev+0x38/0x48
[<ffffff8008083540>] el0_svc_naked+0x34/0x38

Fix by ensuring that the page allocated for the canonical path is zeroed.

Bug: 194856119
Bug: 196051870
Fixes: 24ab59f6bb42 ("ANDROID: fuse: Add support for d_canonical_path")
Signed-off-by: Biao Li <libiao@allwinnertech.com>
Signed-off-by: Shuosheng Huang <huangshuosheng@allwinnertech.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I400815dc1049d90c308f5cf87ce60de97ff82131
[cyberknight777: backport to 4.14]
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: onettboots <blackcocopet@gmail.com>
2023-08-09 18:16:10 -05:00