Squashed commit of the following: commit 37695a77521cfccbf92840cc13dcc4d8cb7dda96 Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com> Date: Thu Feb 16 00:00:20 2023 +0800 raphael_defconfig: enable erofs highpri percpu kthread commit 816e4801de2002f5f53e7cd2f7aea282755d5391 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Mar 6 15:48:21 2023 -0500 fs/(erofs || f2fs): drop WQ_UNBOUND Due to asym arm64 latency regression on WQ_UNBOUND commit d0e5cb53f102962d0d40ff12f548542d71f6340e Author: John Galt <johngaltfirstrun@gmail.com> Date: Wed Feb 15 10:44:37 2023 -0500 erofs/zdata: modify set sched to use RR at high prio for lower latency Fixes: bdd668d3b54202 commit afc1c08015966909a27c9d3d53d8796e80c3e4ef Author: Sandeep Dhavale <dhavale@google.com> Date: Wed Feb 8 06:53:49 2023 +0000 [WIP] BACKPORT: FROMLIST: erofs: add per-cpu threads for decompression Using per-cpu thread pool we can reduce the scheduling latency compared to workqueue implementation. With this patch scheduling latency and variation is reduced as per-cpu threads are high priority kthread_workers. The results were evaluated on arm64 Android devices running 5.10 kernel. The table below shows resulting improvements of total scheduling latency for the same app launch benchmark runs with 50 iterations. Scheduling latency is the latency between when the task (workqueue kworker vs kthread_worker) became eligible to run to when it actually started running. +-------------------------+-----------+----------------+---------+ | | workqueue | kthread_worker | diff | +-------------------------+-----------+----------------+---------+ | Average (us) | 15253 | 2914 | -80.89% | | Median (us) | 14001 | 2912 | -79.20% | | Minimum (us) | 3117 | 1027 | -67.05% | | Maximum (us) | 30170 | 3805 | -87.39% | | Standard deviation (us) | 7166 | 359 | | +-------------------------+-----------+----------------+---------+ Background: Boot times and cold app launch benchmarks are very important to the android ecosystem as they directly translate to responsiveness from user point of view. While erofs provides a lot of important features like space savings, we saw some performance penalty in cold app launch benchmarks in few scenarios. Analysis showed that the significant variance was coming from the scheduling cost while decompression cost was more or less the same. Having per-cpu thread pool we can see from the above table that this variation is reduced by ~80% on average. This problem was discussed at LPC 2022. Link to LPC 2022 slides and talk at [1] [1] https://lpc.events/event/16/contributions/1338/ Link: https://lore.kernel.org/lkml/Y+DP6V9fZG7XPPGy@debian/ Change-Id: I454da5bc17f285d99047b93dc1fc70444f287156 Signed-off-by: Sandeep Dhavale <dhavale@google.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 354d97368e8ffd832a43f6aa0d7c43f52268ca80 Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com> Date: Sat May 7 13:21:24 2022 +0800 sm8150: dtsi: remove barrier and discard mount options commit 6c0b4a711ecb5b0e30c6115959b48af641e9b5bf Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com> Date: Sat May 7 13:20:47 2022 +0800 Revert "arch: arm64: disable erofs" This reverts commit fe6fe5ef6107fc245ca50cd38f585e580fe2fc59. commit 515b1441ad6ac0f9e1c74013cd80e9b30065edc0 Author: kondors1995 <normandija1945@gmail.com> Date: Wed Feb 8 16:43:29 2023 +0200 Revert "raphael_defconfig: Revert FBEv2 defconfig changes" This reverts commit 97bb4a1d5d103804c72617481fca9b6cf93660a2. commit c010e1a5176d73f3829ce49cfdb0fcc0ee5c777c Author: Yue Hu <huyue2@coolpad.com> Date: Thu Apr 7 13:05:43 2022 +0800 erofs: do not prompt for risk any more when using big pcluster The big pcluster feature has been merged for a year, it has been mostly stable now. Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220407050505.12683-1-huyue2@coolpad.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com> commit b135290ae7af3f5f7b69e24c6ca678c4f6572cf2 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:23:06 2022 -0400 erofs: Squashed revert of some recent backports: Keep out of release branch until d71eb1da8e8b59a7072c51ce48175e159ecfd79a is fixed, and also readmore decompress strategy is introduced. commit b9494371e2493f1a8ccc18b1c80f67867f6f623a Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:49 2022 -0400 Revert "erofs: iomap support for non-tailpacking DIO" This reverts commit 804ddc92b769a9cc9926d0262725e6330d0f0a76. commit 0649a6ed5e759857aabc334abeddacbe4eac7859 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:41 2022 -0400 Revert "erofs: adapt 3f4e33b91a28 to our tree" This reverts commit 016f1ffa36da74ab67ed99abd474a0b2da5133eb. commit a3704a5a79990f75c8336c9001939db6e6d21181 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:33 2022 -0400 Revert "erofs: add support for the full decompressed length" This reverts commit a4a195b954114aeb741cf4f8b14256ed92e7c545. commit 5a506fe78d7624f1a94e60d0e3d7113ae6934ea7 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:27 2022 -0400 Revert "erofs: add fiemap support with iomap" This reverts commit 07577933c3fb397791f113ad36fac7a061385826. commit dd93cf9efb3d1f9608780c44a50a860eb9921cf4 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:16 2022 -0400 Revert "erofs: introduce chunk-based file on-disk format" This reverts commit 690f4dc6d3b27ed6278b8fbae20273883f616e56. commit a1846fe6257df43564f42eb153131796f3fd84ed Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:08 2022 -0400 Revert "erofs: support reading chunk-based uncompressed files" This reverts commit 5bd83bfc55b6169af5bbf3c0ba4528577c2fa1ff. commit 3e1c2530db00b6605d8db09e207cb3633e61cdba Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:22:03 2022 -0400 Revert "erofs: fix double free of 'copied'" This reverts commit c608a6f861e0d457d6c9a5905e8b3d928e672075. commit 7a9e0f351f8d41a01a0763316bbd4b6ace94bea0 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:21:52 2022 -0400 Revert "erofs: fix misbehavior of unsupported chunk format check" This reverts commit 751e7c533e451b3c6a51f7d2a69224cca39e8c20. commit 37b05816e45d519643dd9d162b827311abf3b034 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:21:44 2022 -0400 Revert "erofs: get compression algorithms directly on mapping" This reverts commit 98b09cde747826f6fe3aae50eb05659f7f2803f7. commit de74ca4af181a35ac037a44f07cf6a7e55e0f127 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:21:35 2022 -0400 Revert "erofs: introduce the secondary compression head" This reverts commit feea4ee667bf5d5fa2c6d0c5f57697476dce7ca7. commit dda6e8eaddd3203cfafd6c82d2e751f2e6d16766 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:21:29 2022 -0400 Revert "erofs: clean up z_erofs_extent_lookback" This reverts commit c08dbda40a4f3016ee6c60ae2a19e3ecc518361c. commit 2e5fd527a76eba733464b0ba71fe92abc839b62b Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon Jun 6 13:21:23 2022 -0400 Revert "erofs: clean up erofs_map_blocks tracepoints" This reverts commit d71eb1da8e8b59a7072c51ce48175e159ecfd79a. commit ed6e7f36515d6d80c75c4d0803b636e17f328a6c Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Thu Dec 9 09:29:18 2021 +0800 erofs: clean up erofs_map_blocks tracepoints Since the new type of chunk-based files is introduced, there is no need to leave flatmode tracepoints. Rename to erofs_map_blocks instead. Link: https://lore.kernel.org/r/20211209012918.30337-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 525147ad9beef7e521c1667509db763e970c06d3 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Fri Mar 11 02:27:42 2022 +0800 erofs: clean up z_erofs_extent_lookback Avoid the unnecessary tail recursion since it can be converted into a loop directly in order to prevent potential stack overflow. It's a pretty straightforward conversion. Link: https://lore.kernel.org/r/20220310182743.102365-1-hsiangkao@linux.alibaba.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit db45bcfb35a2cd8d49e159c0cc70635b713183a4 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Mon Oct 18 00:57:21 2021 +0800 erofs: introduce the secondary compression head Previously, for each HEAD lcluster, it can be either HEAD or PLAIN lcluster to indicate whether the whole pcluster is compressed or not. In this patch, a new HEAD2 head type is introduced to specify another compression algorithm other than the primary algorithm for each compressed file, which can be used for upcoming LZMA compression and LZ4 range dictionary compression for various data patterns. It has been stayed in the EROFS roadmap for years. Complete it now! Link: https://lore.kernel.org/r/20211017165721.2442-1-xiang@kernel.org Reviewed-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit f0fe9e97d03ed484a51f764373ad0c5941949869 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Sat Oct 9 04:08:37 2021 +0800 erofs: get compression algorithms directly on mapping Currently, z_erofs_map_blocks_iter() returns whether extents are compressed or not, and the decompression frontend gets the specific algorithms then. It works but not quite well in many aspests, for example: - The decompression frontend has to deal with whether extents are compressed or not again and lookup the algorithms if compressed. It's duplicated and too detailed about the on-disk mapping. - A new secondary compression head will be introduced later so that each file can have 2 compression algorithms at most for different type of data. It could increase the complexity of the decompression frontend if still handled in this way; - A new readmore decompression strategy will be introduced to get better performance for much bigger pcluster and lzma, which needs the specific algorithm in advance as well. Let's look up compression algorithms in z_erofs_map_blocks_iter() directly instead. Link: https://lore.kernel.org/r/20211008200839.24541-2-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 588fc2156404c552d4c2c7bcc5def820966a1ba1 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Wed Sep 22 17:51:41 2021 +0800 erofs: fix misbehavior of unsupported chunk format check Unsupported chunk format should be checked with "if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)" Found when checking with 4k-byte blockmap (although currently mkfs uses inode chunk indexes format by default.) Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 613122535bafaabb0e58a9c347c5b6f1b8e6fa91 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Wed Aug 25 20:07:57 2021 +0800 erofs: fix double free of 'copied' Dan reported a new smatch warning [1] "fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'" Due to new chunk-based format handling logic, the error path can be called after kfree(copied). Set "copied = NULL" after kfree(copied) to fix this. [1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 7b648f684ea7c99deab7278f0c2cbbf74797a56d Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Fri Aug 20 18:00:19 2021 +0800 erofs: support reading chunk-based uncompressed files Add runtime support for chunk-based uncompressed files described in the previous patch. Link: https://lore.kernel.org/r/20210820100019.208490-2-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit d9737546275a3c460177a3ce9e01096bc3cfc3ad Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Fri Aug 20 18:00:18 2021 +0800 erofs: introduce chunk-based file on-disk format Currently, uncompressed data except for tail-packing inline is consecutive on disk. In order to support chunk-based data deduplication, add a new corresponding inode data layout. In the future, the data source of chunks can be either (un)compressed. Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.com Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 47f6bed39a7a83aa59be667657cba886dbd4b79b Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Fri Aug 13 13:29:31 2021 +0800 erofs: add fiemap support with iomap This adds fiemap support for both uncompressed files and compressed files by using iomap infrastructure. Link: https://lore.kernel.org/r/20210813052931.203280-3-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 82cc95ee585c9b033a43b0564173d4c444e3a4ac Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Wed Aug 18 23:22:31 2021 +0800 erofs: add support for the full decompressed length Previously, there is no need to get the full decompressed length since EROFS supports partial decompression. However for some other cases such as fiemap, the full decompressed length is necessary for iomap to make it work properly. This patch adds a way to get the full decompressed length. Note that it takes more metadata overhead and it'd be avoided if possible in the performance sensitive scenario. Link: https://lore.kernel.org/r/20210818152231.243691-1-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 8ff30ee6aaa1130bc26af4a98a818d91820c0bdb Author: John Galt <johngaltfirstrun@gmail.com> Date: Thu May 12 12:08:04 2022 -0400 erofs: adapt 3f4e33b91a28 to our tree commit 71e2f8865698e382349a16d8f90e5d74f935ff2a Author: Huang Jianan <huangjianan@oppo.com> Date: Thu Aug 5 08:35:59 2021 +0800 erofs: iomap support for non-tailpacking DIO Add iomap support for non-tailpacking uncompressed data in order to support DIO and DAX. Direct I/O is useful in certain scenarios for uncompressed files. For example, double pagecache can be avoid by direct I/O when loop device is used for uncompressed files containing upper layer compressed filesystem. This adds iomap DIO support for non-tailpacking cases first and tail-packing inline files are handled in the follow-up patch. Link: https://lore.kernel.org/r/20210805003601.183063-2-hsiangkao@linux.alibaba.com Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 8bc571a229c3701405ac47f689db283ac99f2b2d Author: Goldwyn Rodrigues <rgoldwyn@suse.com> Date: Fri Aug 30 12:09:24 2019 -0500 fs: export generic_file_buffered_read() Export generic_file_buffered_read() to be used to supplement incomplete direct reads. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> commit 34c8cbbc7b932ac50e90da6e838524fd1f162aca Author: Dan Williams <dan.j.williams@intel.com> Date: Wed Mar 7 15:26:44 2018 -0800 fs, dax: prepare for dax-specific address_space_operations In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Define some generic VFS aops helpers for dax. These noop implementations are there in the dax case to prevent the VFS from falling back to operations with page-cache assumptions, dax_writeback_mapping_range() may not be referenced in the FS_DAX=n case. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Matthew Wilcox <mawilcox@microsoft.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> commit b0da008763834f165e8a055e011f223b3981316d Author: Andreas Gruenbacher <agruenba@redhat.com> Date: Sun Oct 1 17:55:54 2017 -0400 iomap: Switch from blkno to disk offset Replace iomap->blkno, the sector number, with iomap->addr, the disk offset in bytes. For invalid disk offsets, use the special value IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK. This allows to use iomap for mappings which are not block aligned, such as inline data on ext4. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> # iomap, xfs Reviewed-by: Jan Kara <jack@suse.cz> commit b74997cce993dd0408a0beeb36bd28652e272108 Author: Matthew Wilcox <mawilcox@microsoft.com> Date: Tue Nov 28 15:39:51 2017 -0500 idr: Rename idr_for_each_entry_ext Most places in the kernel that we need to distinguish functions by the type of their arguments, we use '_ul' as a suffix for the unsigned long variant, not '_ext'. Also add kernel-doc. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> commit a562faeba73cfb13de1f278c95be606faa3e4f21 Author: Matthew Wilcox <mawilcox@microsoft.com> Date: Tue Nov 28 10:14:27 2017 -0500 idr: Add idr_alloc_u32 helper All current users of idr_alloc_ext() actually want to allocate a u32 and idr_alloc_u32() fits their needs better. Like idr_get_next(), it uses a 'nextid' argument which serves as both a pointer to the start ID and the assigned ID (instead of a separate minimum and pointer-to-assigned-ID argument). It uses a 'max' argument rather than 'end' because the semantics that idr_alloc has for 'end' don't work well for unsigned types. Since idr_alloc_u32() returns an errno instead of the allocated ID, mark it as __must_check to help callers use it correctly. Include copious kernel-doc. Chris Mi <chrism@mellanox.com> has promised to contribute test-cases for idr_alloc_u32. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> commit 4b24e4564260899c64b9532440a9b5545dbfe7f9 Author: Matthew Wilcox <mawilcox@microsoft.com> Date: Tue Apr 10 16:36:48 2018 -0700 fscache: use appropriate radix tree accessors Don't open-code accesses to data structure internals. Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> commit 7469480be01c3394807cbd0991f06b8d6f2d4403 Author: Matthew Wilcox <mawilcox@microsoft.com> Date: Tue Apr 10 16:36:44 2018 -0700 export __set_page_dirty XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> commit c53045287025992bc775081dbab63ac926a597e8 Author: Matthew Wilcox <mawilcox@microsoft.com> Date: Tue Apr 10 16:36:28 2018 -0700 radix tree: use GFP_ZONEMASK bits of gfp_t for flags Patch series "XArray", v9. (First part thereof). This patchset is, I believe, appropriate for merging for 4.17. It contains the XArray implementation, to eventually replace the radix tree, and converts the page cache to use it. This conversion keeps the radix tree and XArray data structures in sync at all times. That allows us to convert the page cache one function at a time and should allow for easier bisection. Other than renaming some elements of the structures, the data structures are fundamentally unchanged; a radix tree walk and an XArray walk will touch the same number of cachelines. I have changes planned to the XArray data structure, but those will happen in future patches. Improvements the XArray has over the radix tree: - The radix tree provides operations like other trees do; 'insert' and 'delete'. But what most users really want is an automatically resizing array, and so it makes more sense to give users an API that is like an array -- 'load' and 'store'. We still have an 'insert' operation for users that really want that semantic. - The XArray considers locking as part of its API. This simplifies a lot of users who formerly had to manage their own locking just for the radix tree. It also improves code generation as we can now tell RCU that we're holding a lock and it doesn't need to generate as much fencing code. The other advantage is that tree nodes can be moved (not yet implemented). - GFP flags are now parameters to calls which may need to allocate memory. The radix tree forced users to decide what the allocation flags would be at creation time. It's much clearer to specify them at allocation time. - Memory is not preloaded; we don't tie up dozens of pages on the off chance that the slab allocator fails. Instead, we drop the lock, allocate a new node and retry the operation. We have to convert all the radix tree, IDA and IDR preload users before we can realise this benefit, but I have not yet found a user which cannot be converted. - The XArray provides a cmpxchg operation. The radix tree forces users to roll their own (and at least four have). - Iterators take a 'max' parameter. That simplifies many users and will reduce the amount of iteration done. - Iteration can proceed backwards. We only have one user for this, but since it's called as part of the pagefault readahead algorithm, that seemed worth mentioning. - RCU-protected pointers are not exposed as part of the API. There are some fun bugs where the page cache forgets to use rcu_dereference() in the current codebase. - Value entries gain an extra bit compared to radix tree exceptional entries. That gives us the extra bit we need to put huge page swap entries in the page cache. - Some iterators now take a 'filter' argument instead of having separate iterators for tagged/untagged iterations. The page cache is improved by this: - Shorter, easier to read code - More efficient iterations - Reduction in size of struct address_space - Fewer walks from the top of the data structure; the XArray API encourages staying at the leaf node and conducting operations there. This patch (of 8): None of these bits may be used for slab allocations, so we can use them as radix tree flags as long as we mask them off before passing them to the slab allocator. Move the IDR flag from the high bits to the GFP_ZONEMASK bits. Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> commit c95250f9f545568b87775f1a2a48d412203161f7 Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon May 16 10:45:14 2022 -0400 Revert "erofs: compression fixes" This reverts commit 208dabff2d5e3e616a86df8bdba814d54b1a8a1f. Fixes a deadlock when fix shrinking erofs slab. commit d07627505cd871bb1a539377434dede2f4a18d9c Author: John Galt <johngaltfirstrun@gmail.com> Date: Mon May 16 09:41:14 2022 -0400 Revert "erofs: fixes for compilation" This reverts commit c7bf11979051cda0e7b37857289503fa4831c549. commit 7846d0f267ba3572570917e4880d60c79939bf5c Author: Hongyu Jin <hongyu.jin@unisoc.com> Date: Fri Apr 1 19:55:27 2022 +0800 erofs: fix use-after-free of on-stack io[] The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 9fa705504bf016a360c10edc3c9c5cbf8d870a78 Author: John Galt <johngaltfirstrun@gmail.com> Date: Thu May 5 22:40:43 2022 -0400 erofs: extend 3812dc21ec commit 4cda8c8c3d0ea4b3cb0f660db01697b50f7bfddc Author: Yue Hu <huyue2@yulong.com> Date: Thu Oct 14 14:57:44 2021 +0800 erofs: remove the fast path of per-CPU buffer decompression As Xiang mentioned, such path has no real impact to our current decompression strategy, remove it directly. Also, update the return value of z_erofs_lz4_decompress() to 0 if success to keep consistent with LZMA which will return 0 as well for that case. Link: https://lore.kernel.org/r/20211014065744.1787-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 20122adf7721eff6c6ff90db545e0597501d942f Author: Yue Hu <huyue2@yulong.com> Date: Tue Sep 14 11:59:15 2021 +0800 erofs: clear compacted_2b if compacted_4b_initial > totalidx Currently, the whole indexes will only be compacted 4B if compacted_4b_initial > totalidx. So, the calculated compacted_2b is worthless for that case. It may waste CPU resources. No need to update compacted_4b_initial as mkfs since it's used to fulfill the alignment of the 1st compacted_2b pack and would handle the case above. We also need to clarify compacted_4b_end here. It's used for the last lclusters which aren't fitted in the previous compacted_2b packs. Some messages are from Xiang. Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> [ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 3243783e85d10ccc00b9e8cb37960ed1fc1e9fef Author: Yue Hu <huyue2@yulong.com> Date: Tue Aug 10 15:24:16 2021 +0800 erofs: remove the mapping parameter from erofs_try_to_free_cached_page() The mapping is not used at all, remove it and update related code. Link: https://lore.kernel.org/r/20210810072416.1392-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 2936d3798b6c340459813a0eeb2409a4cb34e44f Author: Yue Hu <huyue2@yulong.com> Date: Tue Aug 10 14:54:50 2021 +0800 erofs: directly use wrapper erofs_page_is_managed() when shrinking We already have the wrapper function to identify managed page. Link: https://lore.kernel.org/r/20210810065450.1320-1-zbestahu@gmail.com Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> commit 09b3effb67cdec2ce718d83a363c0a2df5f3d372 Author: Yue Hu <huyue2@yulong.com> Date: Mon Apr 19 18:26:23 2021 +0800 erofs: remove the occupied parameter from z_erofs_pagevec_enqueue() No any behavior to variable occupied in z_erofs_attach_page() which is only caller to z_erofs_pagevec_enqueue(). Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Gao Xiang <xiang@kernel.org> commit b5b28aefcf024c86c3f930293ba36482f96faf34 Author: Gao Xiang <xiang@kernel.org> Date: Mon May 10 14:47:15 2021 +0800 erofs: fix 1 lcluster-sized pcluster for big pcluster If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type rather than a HEAD or PLAIN type instead, which means its pclustersize _must_ be 1 lcluster (since its uncompressed size < 2 lclusters), as illustrated below: HEAD HEAD / PLAIN lcluster type ____________ ____________ |_:__________|_________:__| file data (uncompressed) . . .____________. |____________| pcluster data (compressed) Such on-disk case was explained before [1] but missed to be handled properly in the runtime implementation. It can be observed if manually generating 1 lcluster-sized pcluster with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now. [1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org> commit 2cfa0bcf32db1431e18d636e0ff5c592768b9620 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:27 2021 +0800 erofs: enable big pcluster feature Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are all settled properly. Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit d75144d8d0395bca0a1a629a3b9ab6a95112a083 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:26 2021 +0800 erofs: support decompress big pcluster for lz4 backend Prior to big pcluster, there was only one compressed page so it'd easy to map this. However, when big pcluster is enabled, more work needs to be done to handle multiple compressed pages. In detail, - (maptype 0) if there is only one compressed page + no need to copy inplace I/O, just map it directly what we did before; - (maptype 1) if there are more compressed pages + no need to copy inplace I/O, vmap such compressed pages instead; - (maptype 2) if inplace I/O needs to be copied, use per-CPU buffers for decompression then. Another thing is how to detect inplace decompression is feasable or not (it's still quite easy for non big pclusters), apart from the inplace margin calculation, inplace I/O page reusing order is also needed to be considered for each compressed page. Currently, if the compressed page is the xth page, it shouldn't be reused as [0 ... nrpages_out - nrpages_in + x], otherwise a full copy will be triggered. Although there are some extra optimization ideas for this, I'd like to make big pcluster work correctly first and obviously it can be further optimized later since it has nothing with the on-disk format at all. Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit f344f71c42af2866c748ae22e1b133a02594b367 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:25 2021 +0800 erofs: support parsing big pcluster compact indexes Different from non-compact indexes, several lclusters are packed as the compact form at once and an unique base blkaddr is stored for each pack, so each lcluster index would take less space on avarage (e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER switch should be consistent for compact head0/1. Prior to big pcluster, the size of all pclusters was 1 lcluster. Therefore, when a new HEAD lcluster was scanned, blkaddr would be bumped by 1 lcluster. However, that way doesn't work anymore for big pcluster since we actually don't know the compressed size of pclusters in advance (before reading CBLKCNT lcluster). So, instead, let blkaddr of each pack be the first pcluster blkaddr with a valid CBLKCNT, in detail, 1) if CBLKCNT starts at the pack, this first valid pcluster is itself, e.g. _____________________________________________________________ |_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ... ^ = blkaddr base ^ += CBLKCNT0 ^ += CBLKCNT1 2) if CBLKCNT doesn't start at the pack, the first valid pcluster is the next pcluster, e.g. _________________________________________________________ | NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ... ^ = blkaddr base ^ += CBLKCNT0 ^ += 1 When a CBLKCNT is found, blkaddr will be increased by CBLKCNT lclusters, or a new HEAD is found immediately, bump blkaddr by 1 instead (see the picture above.) Also noted if CBLKCNT is the end of the pack, instead of storing delta1 (distance of the next HEAD lcluster) as normal NONHEADs, it still uses the compressed block count (delta0) since delta1 can be calculated indirectly but the block count can't. Adjust decoding logic to fit big pcluster compact indexes as well. Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 7af2a5cf065073d6f43298b2c96676f9315709d5 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:24 2021 +0800 erofs: support parsing big pcluster compress indexes When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes will also have the same on-disk header compact indexes to keep per-file configurations instead of leaving it zeroed. If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each pcluster in this file by parsing 1st non-head lcluster. Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 81a0c5100c6b09b91b7cfdad429fc66d65335be2 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:23 2021 +0800 erofs: adjust per-CPU buffers according to max_pclusterblks Adjust per-CPU buffers on demand since big pcluster definition is available. Also, bail out unsupported pcluster size according to Z_EROFS_PCLUSTER_MAX_SIZE. Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 56612c78a9aeefc38d6b9bd7a6fef06eebe0c4b6 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:22 2021 +0800 erofs: add big physical cluster definition Big pcluster indicates the size of compressed data for each physical pcluster is no longer fixed as block size, but could be more than 1 block (more accurately, 1 logical pcluster) When big pcluster feature is enabled for head0/1, delta0 of the 1st non-head lcluster index will keep block count of this pcluster in lcluster size instead of 1. Or, the compressed size of pcluster should be 1 lcluster if pcluster has no non-head lcluster index. Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since it depends on COMPR_CFGS and will be released together. Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit a67309917444753f1cebfee2d2503cf68269e54a Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:21 2021 +0800 erofs: fix up inplace I/O pointer for big pcluster When picking up inplace I/O pages, it should be traversed in reverse order in aligned with the traversal order of file-backed online pages. Also, index should be updated together when preloading compressed pages. Previously, only page-sized pclustersize was supported so no problem at all. Also rename `compressedpages' to `icpage_ptr' to reflect its functionality. Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 8fabf77d1a435d68b2bbb89c51f8351ef8efed26 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:20 2021 +0800 erofs: introduce physical cluster slab pools Since multiple pcluster sizes could be used at once, the number of compressed pages will become a variable factor. It's necessary to introduce slab pools rather than a single slab cache now. This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no use now. Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit c9b891a3fd81d315815f496f1282c95e98507812 Author: Gao Xiang <hsiangkao@redhat.com> Date: Sat Apr 10 03:06:30 2021 +0800 erofs: introduce multipage per-CPU buffers To deal the with the cases which inplace decompression is infeasible for some inplace I/O. Per-CPU buffers was introduced to get rid of page allocation latency and thrash for low-latency decompression algorithms such as lz4. For the big pcluster feature, introduce multipage per-CPU buffers to keep such inplace I/O pclusters temporarily as well but note that per-CPU pages are just consecutive virtually. When a new big pcluster fs is mounted, its max pclustersize will be read and per-CPU buffers can be growed if needed. Shrinking adjustable per-CPU buffers is more complex (because we don't know if such size is still be used), so currently just release them all when unloading. Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 6751c7549b38cfe2044fc3d6e03c25c0067e700d Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Apr 7 12:39:18 2021 +0800 erofs: reserve physical_clusterbits[] Formal big pcluster design is actually more powerful / flexable than the previous thought whose pclustersize was fixed as power-of-2 blocks, which was obviously inefficient and space-wasting. Instead, pclustersize can now be set independently for each pcluster, so various pcluster sizes can also be used together in one file if mkfs wants (for example, according to data type and/or compression ratio). Let's get rid of previous physical_clusterbits[] setting (also notice that corresponding on-disk fields are still 0 for now). Therefore, head1/2 can be used for at most 2 different algorithms in one file and again pclustersize is now independent of these. Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.org Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 7c717bd2fb96a7ee82346bc88ddd28c5812c689d Author: Ruiqi Gong <gongruiqi1@huawei.com> Date: Wed Mar 31 05:39:20 2021 -0400 erofs: Clean up spelling mistakes found in fs/erofs zmap.c: s/correspoinding/corresponding zdata.c: s/endding/ending Link: https://lore.kernel.org/r/20210331093920.31923-1-gongruiqi1@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ruiqi Gong <gongruiqi1@huawei.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 44f277dee13de691fe1fc483b55b4bc8ade3da36 Author: Gao Xiang <hsiangkao@redhat.com> Date: Mon Mar 29 18:00:12 2021 +0800 erofs: add on-disk compression configurations Add a bitmap for available compression algorithms and a variable-sized on-disk table for compression options in preparation for upcoming big pcluster and LZMA algorithm, which follows the end of super block. To parse the compression options, the bitmap is scanned one by one. For each available algorithm, there is data followed by 2-byte `length' correspondingly (it's enough for most cases, or entire fs blocks should be used.) With such available algorithm bitmap, kernel itself can also refuse to mount such filesystem if any unsupported compression algorithm exists. Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER. Link: https://lore.kernel.org/r/20210329100012.12980-1-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit e43a280cd5ca073e9d8cfa0471cdabf8f8500181 Author: Gao Xiang <hsiangkao@redhat.com> Date: Mon Mar 29 09:23:07 2021 +0800 erofs: introduce on-disk lz4 fs configurations Introduce z_erofs_lz4_cfgs to store all lz4 configurations. Currently it's only max_distance, but will be used for new features later. Link: https://lore.kernel.org/r/20210329012308.28743-4-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit d4108bf277b411bfdfa0eb12c2172b4035471d8b Author: Huang Jianan <huangjianan@oppo.com> Date: Mon Mar 29 09:23:06 2021 +0800 erofs: support adjust lz4 history window size lz4 uses LZ4_DISTANCE_MAX to record history preservation. When using rolling decompression, a block with a higher compression ratio will cause a larger memory allocation (up to 64k). It may cause a large resource burden in extreme cases on devices with small memory and a large number of concurrent IOs. So appropriately reducing this value can improve performance. Decreasing this value will reduce the compression ratio (except when input_size <LZ4_DISTANCE_MAX). But considering that erofs currently only supports 4k output, reducing this value will not significantly reduce the compression benefits. The maximum value of LZ4_DISTANCE_MAX defined by lz4 is 64k, and we can only reduce this value. For the old kernel, it just can't reduce the memory allocation during rolling decompression without affecting the decompression result. Link: https://lore.kernel.org/r/20210329012308.28743-3-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> [ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ] Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 89a30917b8f584f34216b053ff5e4b8e1fa1a81a Author: Gao Xiang <hsiangkao@redhat.com> Date: Mon Mar 29 09:23:05 2021 +0800 erofs: introduce erofs_sb_has_xxx() helpers Introduce erofs_sb_has_xxx() to make long checks short, especially for later big pcluster & LZMA features. Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 83849318acff8125846f2447ed318f80db4dde38 Author: Yue Hu <huyue2@yulong.com> Date: Thu Mar 25 15:10:08 2021 +0800 erofs: don't use erofs_map_blocks() any more Currently, erofs_map_blocks() will be called only from erofs_{bmap, read_raw_page} which are all for uncompressed files. So, the compression branch in erofs_map_blocks() is pointless. Let's remove it and use erofs_map_blocks_flatmode() directly. Also update related comments. Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit dd3b7a71fb79a620a8df1138d74c990df27e04a5 Author: Gao Xiang <hsiangkao@redhat.com> Date: Mon Mar 22 02:32:27 2021 +0800 erofs: complete a missing case for inplace I/O Add a missing case which could cause unnecessary page allocation but not directly use inplace I/O instead, which increases runtime extra memory footprint. The detail is, considering an online file-backed page, the right half of the page is chosen to be cached (e.g. the end page of a readahead request) and some of its data doesn't exist in managed cache, so the pcluster will be definitely kept in the submission chain. (IOWs, it cannot be decompressed without I/O, e.g., due to the bypass queue). Currently, DELAYEDALLOC/TRYALLOC cases can be downgraded as NOINPLACE, and stop online pages from inplace I/O. After this patch, unneeded page allocations won't be observed in pickup_page_for_submission() then. Link: https://lore.kernel.org/r/20210321183227.5182-1-hsiangkao@aol.com Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 2195652f604a78eff0b808c94f7c31c0648d42e8 Author: Huang Jianan <huangjianan@oppo.com> Date: Wed Mar 17 11:54:47 2021 +0800 erofs: use workqueue decompression for atomic contexts only z_erofs_decompressqueue_endio may not be executed in the atomic context, for example, when dm-verity is turned on. In this scenario, data can be decompressed directly to get rid of additional kworker scheduling overhead. Link: https://lore.kernel.org/r/20210317035448.13921-2-huangjianan@oppo.com Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 50a12c462dbc5e3e4d14dc427392fc8e571b1b0b Author: Huang Jianan <huangjianan@oppo.com> Date: Tue Mar 16 11:15:14 2021 +0800 erofs: avoid memory allocation failure during rolling decompression Currently, err would be treated as io error. Therefore, it'd be better to ensure memory allocation during rolling decompression to avoid such io error. In the long term, we might consider adding another !Uptodate case for such case. Link: https://lore.kernel.org/r/20210316031515.90954-1-huangjianan@oppo.com Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Huang Jianan <huangjianan@oppo.com> Signed-off-by: Guo Weichao <guoweichao@oppo.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> commit 5a664357076596a3af1100413bafb00a88dc5ef2 Author: kondors1995 <normandija1945@gmail.com> Date: Mon May 9 16:44:49 2022 +0000 raphael_defconfig: Enable EROFS commit 2409ea765730e7ca72fcc71dc3989eb37306ed81 Author: Tom Levy <tomlevy93@gmail.com> Date: Tue Jul 16 16:30:24 2019 -0700 include/linux/lz4.h: fix spelling and copy-paste errors in documentation Fix a few spelling and grammar errors, and two places where fast/safe in the documentation did not match the function. Link: http://lkml.kernel.org/r/20190321014452.13297-1-tomlevy93@gmail.com Signed-off-by: Tom Levy <tomlevy93@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Kosina <trivial@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> commit 416572f0ce1a90146cb73dd5ea3667899d0f8241 Author: John Galt <johngaltfirstrun@gmail.com> Date: Tue May 3 16:09:48 2022 -0400 erofs: compression fixes commit 8af69e641af0cd017664fbf2fbd9ce2509b2b8dc Author: Luan Cachoroski Halaiko <luhalaiko@gmail.com> Date: Tue Feb 8 20:20:47 2022 -0300 erofs: fixes for compilation Signed-off-by: Luan Cachoroski Halaiko <luhalaiko@gmail.com> commit ad81e37ce0d0af5bdb0115a7eccd673c03d293f0 Author: Gao Xiang <hsiangkao@redhat.com> Date: Wed Dec 9 20:37:17 2020 +0800 erofs: force inplace I/O under low memory scenario Try to forcely switch to inplace I/O under low memory scenario in order to avoid direct memory reclaim due to cached page allocation. Link: https://lore.kernel.org/r/20201209123717.12430-1-hsiangkao@aol.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Change-Id: I8ea2d3b59c68125271f66853cf5dc6ca39e7aaa9 commit e4018facd91f25eb223b94416d1b64f641618577 Author: Gao Xiang <hsiangkao@redhat.com> Date: Tue Dec 8 17:58:34 2020 +0800 erofs: simplify try_to_claim_pcluster() simplify try_to_claim_pcluster() by directly using cmpxchg() here (the retry loop caused more overhead.) Also, move the chain loop detection in and rename it to z_erofs_try_to_claim_pcluster(). Link: https://lore.kernel.org/r/20201208095834.3133565-3-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Change-Id: I8d091ff44123b099ef199eaa4200a00b8854623f commit f28d114732f644b4a6445316095db1f0e818472f Author: Gao Xiang <hsiangkao@redhat.com> Date: Tue Dec 8 17:58:33 2020 +0800 erofs: insert to managed cache after adding to pcl Previously, it could be some concern to call add_to_page_cache_lru() with page->mapping == Z_EROFS_MAPPING_STAGING (!= NULL). In contrast, page->private is used instead now, so partially revert commit 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper") with some adaption for simplicity. Link: https://lore.kernel.org/r/20201208095834.3133565-2-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Change-Id: If250d62b47083649e96d0937eb1990b6c84d768f commit 1a79fe1a476ae08ed0609618951fe863df0ac03a Author: Gao Xiang <hsiangkao@redhat.com> Date: Tue Dec 8 17:58:32 2020 +0800 erofs: get rid of magical Z_EROFS_MAPPING_STAGING Previously, we played around with magical page->mapping for short-lived temporary pages since we need to identify different types of pages in the same pcluster but both invalidated and short-lived temporary pages can have page->mapping == NULL. It was considered as safe because that temporary pages are all non-LRU / non-movable pages. This patch tends to use specific page->private to identify short-lived pages instead so it won't rely on page->mapping anymore. Details are described in "compress.h" as well. Link: https://lore.kernel.org/r/20201208095834.3133565-1-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Change-Id: I2c8650e80cb6016ed828d04f89f8bd3512ca3fb2 commit a50789da7af81e73a8cb0081e788cea5543eff5c Author: Vladimir Zapolskiy <vladimir@tuxera.com> Date: Fri Oct 30 14:28:39 2020 +0200 erofs: remove a void EROFS_VERSION macro set in Makefile Since commit 4f761fa253b4 ("erofs: rename errln/infoln/debugln to erofs_{err, info, dbg}") the defined macro EROFS_VERSION has no affect, therefore removing it from the Makefile is a non-functional change. Link: https://lore.kernel.org/r/20201030122839.25431-1-vladimir@tuxera.com Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Change-Id: Id63ad279985db2a156d62be814bf381c9bea8342 commit d929ef94d4aab35ae96fb6d6efd1a0a23f7d1b48 Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Mon Aug 30 11:44:53 2021 +0800 erofs: move from drivers/staging/ to fs/ Since 5.4, erofs has been moved into fs/. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Change-Id: I95dd967a0097629a9d8eaed1dc11e2cd04f47701 commit 2758a8239cc772c63d5463073b44626ee4e7695a Author: Gao Xiang <hsiangkao@linux.alibaba.com> Date: Wed Aug 25 11:42:03 2021 +0800 erofs: sync up with kernel 5.10 Backport 5.10 LTS erofs to 4.19. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Change-Id: Ibf9c0c47e46090b72e75f09a347100f4ff64f28d commit 1ee3b56216b0d92e2134d6134d2027c842f495b6 Author: Gao Xiang <hsiangkao@redhat.com> Date: Mon Mar 29 08:36:14 2021 +0800 erofs: add unsupported inode i_format check commit 24a806d849c0b0c1d0cd6a6b93ba4ae4c0ec9f08 upstream. If any unknown i_format fields are set (may be of some new incompat inode features), mark such inode as unsupported. Just in case of any new incompat i_format fields added in the future. Link: https://lore.kernel.org/r/20210329003614.6583-1-hsiangkao@aol.com Fixes: 431339ba9042 ("staging: erofs: add inode operations") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 316472dda45a6a8142fc80800fa92f2846911008 Author: Gao Xiang <hsiangkao@redhat.com> Date: Thu Jul 30 01:58:01 2020 +0800 erofs: fix extended inode could cross boundary commit 0dcd3c94e02438f4a571690e26f4ee997524102a upstream. Each ondisk inode should be aligned with inode slot boundary (32-byte alignment) because of nid calculation formula, so all compact inodes (32 byte) cannot across page boundary. However, extended inode is now 64-byte form, which can across page boundary in principle if the location is specified on purpose, although it's hard to be generated by mkfs due to the allocation policy and rarely used by Android use case now mainly for > 4GiB files. For now, only two fields `i_ctime_nsec` and `i_nlink' couldn't be read from disk properly and cause out-of-bound memory read with random value. Let's fix now. Fixes: 431339ba9042 ("staging: erofs: add inode operations") Cc: <stable@vger.kernel.org> # 4.19+ Link: https://lore.kernel.org/r/20200729175801.GA23973@xiangao.remote.csb Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> [ Gao Xiang: resolve non-trivial conflicts for latest 4.19.y. ] Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ee000f1badb6ca558527d2e99e6130e56fe6acfb Author: Gao Xiang <hsiangkao@redhat.com> Date: Sun Nov 1 03:51:02 2020 +0800 erofs: derive atime instead of leaving it empty commit d3938ee23e97bfcac2e0eb6b356875da73d700df upstream. EROFS has _only one_ ondisk timestamp (ctime is currently documented and recorded, we might also record mtime instead with a new compat feature if needed) for each extended inode since EROFS isn't mainly for archival purposes so no need to keep all timestamps on disk especially for Android scenarios due to security concerns. Also, romfs/cramfs don't have their own on-disk timestamp, and squashfs only records mtime instead. Let's also derive access time from ondisk timestamp rather than leaving it empty, and if mtime/atime for each file are really needed for specific scenarios as well, we can also use xattrs to record them then. Link: https://lore.kernel.org/r/20201031195102.21221-1-hsiangkao@aol.com [ Gao Xiang: It'd be better to backport for user-friendly concern. ] Fixes: 431339ba9042 ("staging: erofs: add inode operations") Cc: stable <stable@vger.kernel.org> # 4.19+ Reported-by: nl6720 <nl6720@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> [ Gao Xiang: Manually backport to 4.19.y due to trivial conflicts. ] Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 0601575a0ca46c49aaf765aaba9df8c1ce63cc9a Author: Gao Xiang <hsiangkao@redhat.com> Date: Fri Jun 19 07:43:49 2020 +0800 erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup commit 3c597282887fd55181578996dca52ce697d985a5 upstream. Hongyu reported "id != index" in z_erofs_onlinepage_fixup() with specific aarch64 environment easily, which wasn't shown before. After digging into that, I found that high 32 bits of page->private was set to 0xaaaaaaaa rather than 0 (due to z_erofs_onlinepage_init behavior with specific compiler options). Actually we only use low 32 bits to keep the page information since page->private is only 4 bytes on most 32-bit platforms. However z_erofs_onlinepage_fixup() uses the upper 32 bits by mistake. Let's fix it now. Reported-and-tested-by: Hongyu Jin <hongyu.jin@unisoc.com> Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20200618234349.22553-1-hsiangkao@aol.com Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 02cee974cb788dd6b23837c04e347dbadccb7e67 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Feb 26 16:10:06 2020 +0800 erofs: correct the remaining shrink objects commit 9d5a09c6f3b5fb85af20e3a34827b5d27d152b34 upstream. The remaining count should not include successful shrink attempts. Fixes: e7e9a307be9d ("staging: erofs: introduce workstation for decompression") Cc: <stable@vger.kernel.org> # 4.19+ Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit afe022d9f5721497e63d11d3fdb06c95c6256c23 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Sun Dec 1 16:01:09 2019 +0800 erofs: zero out when listxattr is called with no xattr commit 926d1650176448d7684b991fbe1a5b1a8289e97c upstream. As David reported [1], ENODATA returns when attempting to modify files by using EROFS as an overlayfs lower layer. The root cause is that listxattr could return unexpected -ENODATA by mistake for inodes without xattr. That breaks listxattr return value convention and it can cause copy up failure when used with overlayfs. Resolve by zeroing out if no xattr is found for listxattr. [1] https://lore.kernel.org/r/CAEvUa7nxnby+rxK-KRMA46=exeOMApkDMAV08AjMkkPnTPV4CQ@mail.gmail.com Link: https://lore.kernel.org/r/20191201084040.29275-1-hsiangkao@aol.com Fixes: cadf1ccf1b00 ("staging: erofs: add error handling for xattr submodule") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit fceffbd856369cedfa23b313844d3906de8fd36e Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Oct 9 18:12:39 2019 +0800 staging: erofs: detect potential multiref due to corrupted images commit e12a0ce2fa69798194f3a8628baf6edfbd5c548f upstream. As reported by erofs-utils fuzzer, currently, multiref (ondisk deduplication) hasn't been supported for now, we should forbid it properly. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use EIO instead. ] Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 9b3495631f1dba2feac41c880e564df6e242c8ab Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Oct 9 18:12:38 2019 +0800 staging: erofs: add two missing erofs_workgroup_put for corrupted images commit 138e1a0990e80db486ab9f6c06bd5c01f9a97999 upstream. As reported by erofs-utils fuzzer, these error handling path will be entered to handle corrupted images. Lack of erofs_workgroup_puts will cause unmounting unsuccessfully. Fix these return values to EFSCORRUPTED as well. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20190819103426.87579-4-gaoxiang25@huawei.com [ Gao Xiang: Older kernel versions don't have length validity check and EFSCORRUPTED, thus backport pageofs check for now. ] Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 20b9eea304f612a2cff8690eebc57d228e45b95e Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Oct 9 18:12:37 2019 +0800 staging: erofs: some compressed cluster should be submitted for corrupted images commit ee45197c807895e156b2be0abcaebdfc116487c8 upstream. As reported by erofs_utils fuzzer, a logical page can belong to at most 2 compressed clusters, if one compressed cluster is corrupted, but the other has been ready in submitting chain. The chain needs to submit anyway in order to keep the page working properly (page unlocked with PG_error set, PG_uptodate not set). Let's fix it now. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com [ Gao Xiang: Manually backport to v4.19.y stable. ] Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c61556faf792f95db0edbee6646fa2f52c8515d1 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Oct 9 18:12:36 2019 +0800 staging: erofs: fix an error handling in erofs_readdir() commit acb383f1dcb4f1e79b66d4be3a0b6f519a957b0d upstream. Richard observed a forever loop of erofs_read_raw_page() [1] which can be generated by forcely setting ->u.i_blkaddr to 0xdeadbeef (as my understanding block layer can handle access beyond end of device correctly). After digging into that, it seems the problem is highly related with directories and then I found the root cause is an improper error handling in erofs_readdir(). Let's fix it now. [1] https://lore.kernel.org/r/1163995781.68824.1566084358245.JavaMail.zimbra@nod.at/ Reported-by: Richard Weinberger <richard@nod.at> Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20190818125457.25906-1-hsiangkao@aol.com [ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED, let's use original error code instead. ] Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 44e25b73c4772f5f08d483bbdcfe81c95758e955 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jun 13 16:35:41 2019 +0800 staging: erofs: add requirements field in superblock commit 5efe5137f05bbb4688890620934538c005e7d1d6 upstream. There are some backward incompatible features pending for months, mainly due to on-disk format expensions. However, we should ensure that it cannot be mounted with old kernels. Otherwise, it will causes unexpected behaviors. Fixes: ba2b77a82022 ("staging: erofs: add super block operations") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c458b3206aa217c67af63679c67cda21d1bb63fd Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Mar 29 04:14:58 2019 +0800 staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir() commit 33bac912840fe64dbc15556302537dc6a17cac63 upstream. After commit 419d6efc50e9, kernel cannot be crashed in the namei path. However, corrupted nameoff can do harm in the process of readdir for scenerios without dm-verity as well. Fix it now. Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 77a2c8cadafb7972b2812c097f518ac3099e8a3b Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 25 11:40:07 2019 +0800 staging: erofs: fix error handling when failed to read compresssed data commit b6391ac73400eff38377a4a7364bd3df5efb5178 upstream. Complete read error handling paths for all three kinds of compressed pages: 1) For cache-managed pages, PG_uptodate will be checked since read_endio will unlock and SetPageUptodate for these pages; 2) For inplaced pages, read_endio cannot SetPageUptodate directly since it should be used to mark the final decompressed data, PG_error will be set with page locked for IO error instead; 3) For staging pages, PG_error is used, which is similar to what we do for inplaced pages. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 74528ff6c38df709674cc676f67e79eac815e23f Author: Chao Yu <yuchao0@huawei.com> Date: Mon Mar 11 23:10:10 2019 +0800 staging: erofs: fix to handle error path of erofs_vmap() commit 8bce6dcede65139a087ff240127e3f3c01363eed upstream. erofs_vmap() wrapped vmap() and vm_map_ram() to return virtual continuous memory, but both of them can failed due to a lot of reason, previously, erofs_vmap()'s callers didn't handle them, which can potentially cause NULL pointer access, fix it. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Fixes: 0d40d6e399c1 ("staging: erofs: add a generic z_erofs VLE decompressor") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 910cd92ee289977f064971f7659cda0228ec1615 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Nov 23 01:16:00 2018 +0800 staging: erofs: fix race when the managed cache is enabled commit 51232df5e4b268936beccde5248f312a316800be upstream. When the managed cache is enabled, the last reference count of a workgroup must be used for its workstation. Otherwise, it could lead to incorrect (un)freezes in the reclaim path, and it would be harmful. A typical race as follows: Thread 1 (In the reclaim path) Thread 2 workgroup_freeze(grp, 1) refcnt = 1 ... workgroup_unfreeze(grp, 1) refcnt = 1 workgroup_get(grp) refcnt = 2 (x) workgroup_put(grp) refcnt = 1 (x) ...unexpected behaviors * grp is detached but still used, which violates cache-managed freeze constraint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit a906ead6ff3295233d3643d662309cddb7efd896 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 11 14:08:58 2019 +0800 staging: erofs: keep corrupted fs from crashing kernel in erofs_namei() commit 419d6efc50e94bcf5d6b35cd8c71f79edadec564 upstream. As Al pointed out, " ... and while we are at it, what happens to unsigned int nameoff = le16_to_cpu(de[mid].nameoff); unsigned int matched = min(startprfx, endprfx); struct qstr dname = QSTR_INIT(data + nameoff, unlikely(mid >= ndirents - 1) ? maxsize - nameoff : le16_to_cpu(de[mid + 1].nameoff) - nameoff); /* string comparison without already matched prefix */ int ret = dirnamecmp(name, &dname, &matched); if le16_to_cpu(de[...].nameoff) is not monotonically increasing? I.e. what's to prevent e.g. (unsigned)-1 ending up in dname.len? Corrupted fs image shouldn't oops the kernel.. " Revisit the related lookup flow to address the issue. Fixes: d72d1ce60174 ("staging: erofs: add namei functions") Cc: <stable@vger.kernel.org> # 4.19+ Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 6dbf1a15dcd2f0097d819daa4ee1926b2345d02f Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 11 14:08:57 2019 +0800 staging: erofs: fix race of initializing xattrs of a inode at the same time commit 62dc45979f3f8cb0ea67302a93bff686f0c46c5a upstream. In real scenario, there could be several threads accessing xattrs of the same xattr-uninitialized inode, and init_inode_xattrs() almost at the same time. That's actually an unexpected behavior, this patch closes the race. Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 044ba07158562ecf1b2e9079fa97c9980b523eb0 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 11 14:08:56 2019 +0800 staging: erofs: fix memleak of inode's shared xattr array From: Sheng Yong <shengyong1@huawei.com> commit 3b1b5291f79d040d549d7c746669fc30e8045b9b upstream. If it fails to read a shared xattr page, the inode's shared xattr array is not freed. The next time the inode's xattr is accessed, the previously allocated array is leaked. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 240517d98c12632095f2848bd94c30debdcaf600 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 11 14:08:55 2019 +0800 staging: erofs: fix fast symlink w/o xattr when fs xattr is on commit 7077fffcb0b0b65dc75e341306aeef4d0e7f2ec6 upstream. Currently, this will hit a BUG_ON for these symlinks as follows: - kernel message ------------[ cut here ]------------ kernel BUG at drivers/staging/erofs/xattr.c:59! SMP PTI CPU: 1 PID: 1170 Comm: getllxattr Not tainted 4.20.0-rc6+ #92 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 RIP: 0010:init_inode_xattrs+0x22b/0x270 Code: 48 0f 45 ea f0 ff 4d 34 74 0d 41 83 4c 24 e0 01 31 c0 e9 00 fe ff ff 48 89 ef e8 e0 31 9e ff eb e9 89 e8 e9 ef fd ff ff 0f 0$ <0f> 0b 48 89 ef e8 fb f6 9c ff 48 8b 45 08 a8 01 75 24 f0 ff 4d 34 RSP: 0018:ffffa03ac026bdf8 EFLAGS: 00010246 ------------[ cut here ]------------ ... Call Trace: erofs_listxattr+0x30/0x2c0 ? selinux_inode_listxattr+0x5a/0x80 ? kmem_cache_alloc+0x33/0x170 ? security_inode_listxattr+0x27/0x40 listxattr+0xaf/0xc0 path_listxattr+0x5a/0xa0 do_syscall_64+0x43/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... ---[ end trace 3c24b49408dc0c72 ]--- Fix it by checking ->xattr_isize in init_inode_xattrs(), and it also fixes improper return value -ENOTSUPP (it should be -ENODATA if xattr is enabled) for those inodes. Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support") Cc: <stable@vger.kernel.org> # 4.19+ Reported-by: Li Guifu <bluce.liguifu@huawei.com> Tested-by: Li Guifu <bluce.liguifu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 78544513d768a1559d7e61d5e29270844db027d2 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Mar 11 14:08:54 2019 +0800 staging: erofs: add error handling for xattr submodule commit cadf1ccf1b0021d0b7a9347e102ac5258f9f98c8 upstream. This patch enhances the missing error handling code for xattr submodule, which improves the stability for the rare cases. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit f1f405af62a3f3b37bf965ddbc3ef5aa2fab2f57 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Feb 27 13:33:30 2019 +0800 staging: erofs: compressed_pages should not be accessed again after freed commit af692e117cb8cd9d3d844d413095775abc1217f9 upstream. This patch resolves the following page use-after-free issue, z_erofs_vle_unzip: ... for (i = 0; i < nr_pages; ++i) { ... z_erofs_onlinepage_endio(page); (1) } for (i = 0; i < clusterpages; ++i) { page = compressed_pages[i]; if (page->mapping == mngda) (2) continue; /* recycle all individual staging pages */ (void)z_erofs_gather_if_stagingpage(page_pool, page); (3) WRITE_ONCE(compressed_pages[i], NULL); } ... After (1) is executed, page is freed and could be then reused, if compressed_pages is scanned after that, it could fall info (2) or (3) by mistake and that could finally be in a mess. This patch aims to solve the above issue only with little changes as much as possible in order to make the fix backport easier. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit b3a98208a957c0e05850b82ebf7f474ab295ff00 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Feb 27 13:33:31 2019 +0800 staging: erofs: fix illegal address access under memory pressure commit 1e5ceeab6929585512c63d05911d6657064abf7b upstream. Considering a read request with two decompressed file pages, If a decompression work cannot be started on the previous page due to memory pressure but in-memory LTP map lookup is done, builder->work should be still NULL. Moreover, if the current page also belongs to the same map, it won't try to start the decompression work again and then run into trouble. This patch aims to solve the above issue only with little changes as much as possible in order to make the fix backport easier. kernel message is: <4>[1051408.015930s]SLUB: Unable to allocate memory on node -1, gfp=0x2408040(GFP_NOFS|__GFP_ZERO) <4>[1051408.015930s] cache: erofs_compress, object size: 144, buffer size: 144, default order: 0, min order: 0 <4>[1051408.015930s] node 0: slabs: 98, objs: 2744, free: 0 * Cannot allocate the decompression work <3>[1051408.015960s]erofs: z_erofs_vle_normalaccess_readpages, readahead error at page 1008 of nid 5391488 * Note that the previous page was failed to read <0>[1051408.015960s]Internal error: Accessing user space memory outside uaccess.h routines: 96000005 [#1] PREEMPT SMP ... <4>[1051408.015991s]Hardware name: kirin710 (DT) ... <4>[1051408.016021s]PC is at z_erofs_vle_work_add_page+0xa0/0x17c <4>[1051408.016021s]LR is at z_erofs_do_read_page+0x12c/0xcf0 ... <4>[1051408.018096s][<ffffff80c6fb0fd4>] z_erofs_vle_work_add_page+0xa0/0x17c <4>[1051408.018096s][<ffffff80c6fb3814>] z_erofs_vle_normalaccess_readpages+0x1a0/0x37c <4>[1051408.018096s][<ffffff80c6d670b8>] read_pages+0x70/0x190 <4>[1051408.018127s][<ffffff80c6d6736c>] __do_page_cache_readahead+0x194/0x1a8 <4>[1051408.018127s][<ffffff80c6d59318>] filemap_fault+0x398/0x684 <4>[1051408.018127s][<ffffff80c6d8a9e0>] __do_fault+0x8c/0x138 <4>[1051408.018127s][<ffffff80c6d8f90c>] handle_pte_fault+0x730/0xb7c <4>[1051408.018127s][<ffffff80c6d8fe04>] __handle_mm_fault+0xac/0xf4 <4>[1051408.018157s][<ffffff80c6d8fec8>] handle_mm_fault+0x7c/0x118 <4>[1051408.018157s][<ffffff80c8c52998>] do_page_fault+0x354/0x474 <4>[1051408.018157s][<ffffff80c8c52af8>] do_translation_fault+0x40/0x48 <4>[1051408.018157s][<ffffff80c6c002f4>] do_mem_abort+0x80/0x100 <4>[1051408.018310s]---[ end trace 9f4009a3283bd78b ]--- Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 14b20a49fc73c4818efa3327451904ef6f9c07ab Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Feb 27 13:33:32 2019 +0800 staging: erofs: fix mis-acted TAIL merging behavior commit a112152f6f3a2a88caa6f414d540bd49e406af60 upstream. EROFS has an optimized path called TAIL merging, which is designed to merge multiple reads and the corresponding decompressions into one if these requests read continuous pages almost at the same time. In general, it behaves as follows: ________________________________________________________________ ... | TAIL . HEAD | PAGE | PAGE | TAIL . HEAD | ... _____|_combined page A_|________|________|_combined page B_|____ 1 ] -> [ 2 ] -> [ 3 If the above three reads are requested in the order 1-2-3, it will generate a large work chain rather than 3 individual work chains to reduce scheduling overhead and boost up sequential read. However, if Read 2 is processed slightly earlier than Read 1, currently it still generates 2 individual work chains (chain 1, 2) but it does in-place decompression for combined page A, moreover, if chain 2 decompresses ahead of chain 1, it will be a race and lead to corrupted decompressed page. This patch fixes it. Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 1931a6c5fe28edd9c62d54dd67806c3806e9cdb7 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Dec 11 15:17:50 2018 +0800 staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs commit b8e076a6ef253e763bfdb81e5c72bcc828b0fbeb upstream. remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 0773d1966061cba2de6b226947470baf88feda72 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Dec 11 15:17:49 2018 +0800 staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs commit 70b17991d89554cdd16f3e4fb0179bcc03c808d9 upstream. remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 6a00c9d7066562e418e30b1b211c77aed5c40551 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Dec 5 21:23:13 2018 +0800 staging: erofs: {dir,inode,super}.c: rectify BUG_ONs commit 8b987bca2d09649683cbe496419a011df8c08493 upstream. remove all redundant BUG_ONs, and turn the rest useful usages to DBG_BUGONs. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ef609890e1f8f27546f25d058bcaeb3c5a7a982f Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Nov 23 01:16:03 2018 +0800 staging: erofs: add a full barrier in erofs_workgroup_unfreeze commit 948bbdb1818b7ad6e539dad4fbd2dd4650793ea9 upstream. Just like other generic locks, insert a full barrier in case of memory reorder. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit e88d7d9adb52d0f9ba8028c6b4a13e7e83d743a5 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Nov 23 01:16:02 2018 +0800 staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}' commit 73f5c66df3e26ab750cefcb9a3e08c71c9f79cad upstream. There are two minor issues in the current freeze interface: 1) Freeze interfaces have not related with CONFIG_DEBUG_SPINLOCK, therefore fix the incorrect conditions; 2) For SMP platforms, it should also disable preemption before doing atomic_cmpxchg in case that some high priority tasks preempt between atomic_cmpxchg and disable_preempt, then spin on the locked refcount later. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 26b9413853f64a44d858c24bc2b4c834a2e6a1fc Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Nov 23 01:16:01 2018 +0800 staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup commit df134b8d17b90c1e7720e318d36416b57424ff7a upstream. It's better to use atomic_cond_read_relaxed, which is implemented in hardware instructions to monitor a variable changes currently for ARM64, instead of open-coded busy waiting. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 28e3fa73e294002f8e7c48b6e9ea92784bf9e21a Author: Gao Xiang <gaoxiang25@huawei.com> Date: Sat Nov 3 17:23:56 2018 +0800 staging: erofs: remove the redundant d_rehash() for the root dentry commit e9c892465583c8f42d61fafe30970d36580925df upstream. There is actually no need at all to d_rehash() for the root dentry as Al pointed out, fix it. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit e3e7bbe526acfac4307a2a6d7e2aaf5222ea88de Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Sep 19 13:49:07 2018 +0800 staging: erofs: drop multiref support temporarily commit e5e3abbadf0dbd1068f64f8abe70401c5a178180 upstream. Multiref support means that a compressed page could have more than one reference, which is designed for on-disk data deduplication. However, mkfs doesn't support this mode at this moment, and the kernel implementation is also broken. Let's drop multiref support. If it is fully implemented in the future, it can be reverted later. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 2dd8bd1abced431fa4be477299fa9ddce4677642 Author: Chen Gong <gongchen4@huawei.com> Date: Tue Sep 18 22:27:28 2018 +0800 staging: erofs: replace BUG_ON with DBG_BUGON in data.c commit 9141b60cf6a53c99f8a9309bf8e1c6650a6785c1 upstream. This patch replace BUG_ON with DBG_BUGON in data.c, and add necessary error handler. Signed-off-by: Chen Gong <gongchen4@huawei.com> Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit a14a5cf712938fadd39fb99a8f8a46d72b19cd4d Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Sep 18 22:27:25 2018 +0800 staging: erofs: complete error handing of z_erofs_do_read_page commit 1e05ff36e6921ca61bdbf779f81a602863569ee3 upstream. This patch completes error handing code of z_erofs_do_read_page. PG_error will be set when some read error happens, therefore z_erofs_onlinepage_endio will unlock this page without setting PG_uptodate. Reviewed-by: Chao Yu <yucxhao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 381d39d1c2d471e4c318320bae60806c5d0b04bd Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Sep 18 22:25:36 2018 +0800 staging: erofs: fix a bug when appling cache strategy commit 0734ffbf574ee813b20899caef2fe0ed502bb783 upstream. As described in Kconfig, the last compressed pack should be cached for further reading for either `EROFS_FS_ZIP_CACHE_UNIPOLAR' or `EROFS_FS_ZIP_CACHE_BIPOLAR' by design. However, there is a bug in z_erofs_do_read_page, it will switch `initial' to `false' at the very beginning before it decides to cache the last compressed pack. caching strategy should work properly after appling this patch. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 3dc0616d60bcc3888f5dcf4585bcc5e2131a64df Author: Gao Xiang <gaoxiang25@huawei.com> Date: Fri Nov 23 01:15:59 2018 +0800 staging: erofs: fix the definition of DBG_BUGON [ Upstream commit eef168789866514e5d4316f030131c9fe65b643f ] It's better not to positively BUG_ON the kernel, however developers need a way to locate issues as soon as possible. DBG_BUGON is introduced and it could only crash when EROFS_FS_DEBUG (EROFS developping feature) is on. It is helpful for developers to find and solve bugs quickly by eng builds. Previously, DBG_BUGON is defined as ((void)0) if EROFS_FS_DEBUG is off, but some unused variable warnings as follows could occur: drivers/staging/erofs/unzip_vle.c: In function `init_alway:': drivers/staging/erofs/unzip_vle.c:61:33: warning: unused variable `work' [-Wunused-variable] struct z_erofs_vle_work *const work = ^~~~ Fix it to #define DBG_BUGON(x) ((void)(x)). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> commit 92c97ef11b111b764dc92c5edaf9385f74c72e7d Author: Gao Xiang <gaoxiang25@huawei.com> Date: Sat Dec 8 00:19:12 2018 +0800 staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io' [ Upstream commit 848bd9acdcd00c164b42b14aacec242949ecd471 ] The root cause is the race as follows: Thread #0 Thread #1 z_erofs_vle_unzip_kickoff z_erofs_submit_and_unzip struct z_erofs_vle_unzip_io io[] atomic_add_return() wait_event() [end of function] wake_up() Fix it by taking the waitqueue lock between atomic_add_return and wake_up to close such the race. kernel message: Unable to handle kernel paging request at virtual address 97f7052caa1303dc ... Workqueue: kverityd verity_work task: ffffffe32bcb8000 task.stack: ffffffe3298a0000 PC is at __wake_up_common+0x48/0xa8 LR is at __wake_up+0x3c/0x58 ... Call trace: ... [<ffffff94a08ff648>] __wake_up_common+0x48/0xa8 [<ffffff94a08ff8b8>] __wake_up+0x3c/0x58 [<ffffff94a0c11b60>] z_erofs_vle_unzip_kickoff+0x40/0x64 [<ffffff94a0c118e4>] z_erofs_vle_read_endio+0x94/0x134 [<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8 [<ffffff94a1076540>] dec_pending+0x134/0x32c [<ffffff94a1076f28>] clone_endio+0x90/0xf4 [<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8 [<ffffff94a1095024>] verity_work+0x210/0x368 [<ffffff94a08c4150>] process_one_work+0x188/0x4b4 [<ffffff94a08c45bc>] worker_thread+0x140/0x458 [<ffffff94a08cad48>] kthread+0xec/0x108 [<ffffff94a0883ab4>] ret_from_fork+0x10/0x1c Code: d1006273 54000260 f9400804 b9400019 (b85fc081) ---[ end trace be9dde154f677cd1 ]--- Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> commit 323056dc5fbe4768311194a3a2adf14806f25074 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Sep 18 22:25:33 2018 +0800 staging: erofs: fix a missing endian conversion [ Upstream commit 37ec35a6cc2b99eb7fd6b85b7d7b75dff46bc353 ] This patch fixes a missing endian conversion in vle_get_logical_extent_head. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 69f2b4eaba237770f5c696942595d064ae3340f8 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Sep 6 17:01:47 2018 +0800 staging: erofs: rename superblock flags (MS_xyz -> SB_xyz) This patch follows commit 1751e8a6cb93 ("Rename superblock flags (MS_xyz -> SB_xyz)") and after commit ("vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled"), there is no MS_RDONLY and MS_NOATIME at all. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 1621b077d53285bd5127532ce160cec69adbe660 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Tue Aug 28 11:39:48 2018 +0800 Revert "staging: erofs: disable compiling temporarile" This reverts commit 156c3df8d4db4e693c062978186f44079413d74d. Since XArray and the new mount apis aren't merged in 4.19-rc1 merge window, the BROKEN mark can be reverted directly without any problems. Fixes: 156c3df8d4db ("staging: erofs: disable compiling temporarile") Cc: Matthew Wilcox <willy@infradead.org> Cc: David Howells <dhowells@redhat.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 3bbdccddb4ee53c0b81545226439f231ae698f65 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Aug 6 11:27:53 2018 +0800 staging: erofs: remove an extra semicolon in z_erofs_vle_unzip_all There is an extra semicolon in z_erofs_vle_unzip_all, remove it. Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ee25ad8cd5b803ae4cda0116304ff15383cb6881 Author: Kristaps Čivkulis <kristaps.civkulis@gmail.com> Date: Sun Aug 5 18:21:01 2018 +0300 staging: erofs: fix if assignment style issue Fix coding style issue "do not use assignment in if condition" detected by checkpatch.pl. Signed-off-by: Kristaps Čivkulis <kristaps.civkulis@gmail.com> Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 81d71d6e9a330f4471f3edec412d6124031eac46 Author: Chao Yu <yuchao0@huawei.com> Date: Thu Aug 2 17:39:17 2018 +0800 staging: erofs: disable compiling temporarile As Stephen Rothwell reported: "After merging the staging tree, today's linux-next build (x86_64 allmodconfig) failed like this: drivers/staging/erofs/super.c: In function 'erofs_read_super': drivers/staging/erofs/super.c:343:17: error: 'MS_RDONLY' undeclared (first use in this function); did you mean 'IS_RDONLY'? sb->s_flags |= MS_RDONLY | MS_NOATIME; ^~~~~~~~~ IS_RDONLY drivers/staging/erofs/super.c:343:17: note: each undeclared identifier is reported only once for each function it appears in drivers/staging/erofs/super.c:343:29: error: 'MS_NOATIME' undeclared (first use in this function); did you mean 'S_NOATIME'? sb->s_flags |= MS_RDONLY | MS_NOATIME; ^~~~~~~~~~ S_NOATIME drivers/staging/erofs/super.c: In function 'erofs_mount': drivers/staging/erofs/super.c:501:10: warning: passing argument 5 of 'mount_bdev' makes integer from pointer without a cast [-Wint-conversion] &priv, erofs_fill_super); ^~~~~~~~~~~~~~~~ In file included from include/linux/buffer_head.h:12:0, from drivers/staging/erofs/super.c:14: include/linux/fs.h:2151:23: note: expected 'size_t {aka long unsigned int}' but argument is of type 'int (*)(struct super_block *, void *, int)' extern struct dentry *mount_bdev(struct file_system_type *fs_type, ^~~~~~~~~~ drivers/staging/erofs/super.c:500:9: error: too few arguments to function 'mount_bdev' return mount_bdev(fs_type, flags, dev_name, ^~~~~~~~~~ In file included from include/linux/buffer_head.h:12:0, from drivers/staging/erofs/super.c:14: include/linux/fs.h:2151:23: note: declared here extern struct dentry *mount_bdev(struct file_system_type *fs_type, ^~~~~~~~~~ drivers/staging/erofs/super.c: At top level: drivers/staging/erofs/super.c:518:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types] .mount = erofs_mount, ^~~~~~~~~~~ drivers/staging/erofs/super.c:518:20: note: (near initialization for 'erofs_fs_type.mount') drivers/staging/erofs/super.c: In function 'erofs_remount': drivers/staging/erofs/super.c:630:12: error: 'MS_RDONLY' undeclared (first use in this function); did you mean 'IS_RDONLY'? *flags |= MS_RDONLY; ^~~~~~~~~ IS_RDONLY drivers/staging/erofs/super.c: At top level: drivers/staging/erofs/super.c:640:16: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types] .remount_fs = erofs_remount, ^~~~~~~~~~~~~ Caused by various commits creating erofs in the staging tree interacting with various commits redoing the mount infrastructure in the vfs tree. I have disabed CONFIG_EROFS_FS for now:" The reason of compiling error is: Since -next collects and merges developing patches including common vfs stuff from multi-trees, but those patches didn't cover erofs, such as: ('vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled") https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=109b45090d7d3ce2797bb1ef7f70eead5bfe0ff3 ("vfs: Require specification of size of mount data for internal mounts") https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=0a191e4505a4f255e6513b49426213da69bf0e80 Above vfs related patches has not been merged in staging tree, if we submit those erofs patches to staging mailing list and after including them in staging-{test,nexts} tree, it can easily cause compiling error. We worked out some patches to adjust those vfs change, but now we just submit them to -next tree temporarily to avoid compiling error. For potentail conflict in between erofs and vfs changes in incoming merge window, Stephen suggested that we can disable CONFIG_EROFS_FS temporarily to pass merge window, and after that we can do restore by reenabling CONFIG_EROFS_FS and applying those fixing patches. Also Greg confirmed this solution. So, let's disable compiling erofs for a while. Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 2d4499c8b8b78dd00788a7513373d0900013e850 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Aug 1 14:38:31 2018 +0800 staging: erofs: remove a redundant marco in xattr There is no need to '#if CONFIG_EROFS_FS_XATTR' in xattr.c, let's remove it. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 8eaefd9be86fa3d85305f05192568ba1507dab75 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Wed Aug 1 17:36:54 2018 +0800 staging: erofs: add the missing break in z_erofs_map_blocks_iter This patch adds a missing break after adding the default case. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit b79d82f61f25e532b98f7d3a3d49b250f1728e0d Author: Gao Xiang <gaoxiang25@huawei.com> Date: Mon Jul 30 09:51:01 2018 +0800 staging: erofs: use the wrapped PTR_ERR_OR_ZERO instead of open code Just clean up and logic doesn't change. Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050766.html Fixes: d72d1ce60174 ("staging: erofs: add namei functions") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ef38dd74d8389a7474b0f947185f347e24419686 Author: Gao Xiang <hsiangkao@aol.com> Date: Sun Jul 29 13:37:57 2018 +0800 staging: erofs: fix conditional uninitialized `pcn' in z_erofs_map_blocks_iter This patch adds error handling code for z_erofs_map_blocks_iter to fix the compiler blame. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 294e8e93272dcbdabdd6e033e4d14bbfe6d91bb7 Author: Gao Xiang <hsiangkao@aol.com> Date: Sun Jul 29 13:34:58 2018 +0800 staging: erofs: fix compile error without built-in decompression support This patch fixes incorrect code snippets due to spilt code into small patches by mistake. Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050747.html Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050750.html Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit db6fedf04cecf5fa3d78a941fd068d581813dfa0 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Sat Jul 28 15:10:32 2018 +0800 staging: erofs: fix a compile warning of Z_EROFS_VLE_VMAP_ONSTACK_PAGES There is a type mismatch in the definition of Z_EROFS_VLE_VMAP_ONSTACK_PAGES, let's fix it. Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050707.html Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 73c620c52e51ff2bf93cf02509cdbd9da3d50220 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:08 2018 +0800 staging: erofs: add a TODO and update MAINTAINERS for staging This patch adds a TODO to list the things to be done, and the relevant info to MAINTAINERS so we can take all the blame :) Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit fd66e0b7e7510165f9c2214a0e68d7025f8b8d83 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:07 2018 +0800 staging: erofs: introduce cached decompression This patch adds an optional choice which can be enabled by users in order to cache both incomplete ends of compressed clusters as a complement to the in-place decompression in order to boost random read, but it costs more memory than the in-place decompression only. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit e84127077ff509f7204888244cf848bf9cddd794 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:06 2018 +0800 staging: erofs: introduce VLE decompression support This patch introduces the basic in-place VLE decompression implementation for the erofs file system. Compared with fixed-sized input compression, it implements what we call 'the variable-length extent compression' which specifies the same output size for each compression block to make the full use of IO bandwidth (which means almost all data from block device can be directly used for decomp- ression), improve the real (rather than just via data caching, which costs more memory) random read and keep the relatively lower compression ratios (it saves more storage space than fixed-sized input compression which is also configured with the same input block size), as illustrated below: |--- variable-length extent ---|------ VLE ------|--- VLE ---| /> clusterofs /> clusterofs /> clusterofs /> clusterofs ++---|-------++-----------++---------|-++-----------++-|---------++-| ...|| | || || | || || | || | ... original data ++---|-------++-----------++---------|-++-----------++-|---------++-| ++->cluster<-++->cluster<-++->cluster<-++->cluster<-++->cluster<-++ size size size size size \ / / / \ / / / \ / / / ++-----------++-----------++-----------++ ... || || || || ... compressed clusters ++-----------++-----------++-----------++ ++->cluster<-++->cluster<-++->cluster<-++ size size size The main point of 'in-place' refers to the decompression mode: Instead of allocating independent compressed pages and data structures, it reuses the allocated file cache pages at most to store its compressed data and the corresponding pagevec in a time-sharing approach by default, which will be useful for low memory scenario. In the end, unlike the other filesystems with (de)compression support using a relatively large compression block size, which reads and decompresses >= 128KB at once, and gains a more good-looking random read (In fact it collects small random reads into large sequential reads and caches all decompressed data in memory, but it is unacceptable especially for embedded devices with limited memory, and it is not the real random read), we select a universal small-sized 4KB compressed cluster, which is the smallest page size for most architectures, and all compressed clusters can be read and decompressed independently, which ensures random read number for all use cases. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ab43173ff3316c0120f9b2c3abc325a18773f30f Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:05 2018 +0800 staging: erofs: introduce workstation for decompression This patch introduces another concept used by the unzip subsystem called 'workstation'. It can be seen as a sparse array that stores pointers pointed to data structures related to the corresponding physical blocks. All lookup cases are protected by RCU read lock. Besides, reference count and spin_lock are also introduced to manage its lifetime and serialize all update operations. 'workstation' is currently implemented on the in-kernel radix tree approach for backward compatibility. With the evolution of linux kernel, it could be migrated into XArray implementation in the future. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 84c882ba349e57fa654b0a52d6529bff5c18c0e0 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:04 2018 +0800 staging: erofs: introduce erofs shrinker This patch adds a dedicated shrinker targeting to free unneeded memory consumed by a number of erofs in-memory data structures. Like F2FS and UBIFS, it also adds: - sbi->umount_mutex to avoid races on shrinker and put_super - sbi->shrinker_run_no to not revisit recently scaned objects Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit dc98494e64df2c56c3d6658f60a86b257e9735a3 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:03 2018 +0800 staging: erofs: introduce superblock registration In order to introducing shrinker solution for erofs, let's manage all mounted erofs instances at first. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 8ded5dd185d595bc3664cabc5de54b84021d3314 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:02 2018 +0800 staging: erofs: add a generic z_erofs VLE decompressor Currently, this patch only simply implements LZ4 decompressor due to its development priority. In the future, erofs will support more compression algorithm and format other than LZ4, thus a generic decompressor interface will be needed. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 6406d5e0a4a3a6e88c6898268c36e421d1c5006b Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:01 2018 +0800 staging: erofs: introduce a customized LZ4 decompression We have to reduce the memory cost as much as possible, so we don't want to decompress more data beyond the output buffer size, however "LZ4_decompress_safe_partial" doesn't guarantee to stop at the arbitary end position, but stop just after its current LZ4 "sequence" is completed. Link: https://groups.google.com/forum/#!topic/lz4c/_3kkz5N6n00 Therefore, I hacked the LZ4 decompression logic by hand, probably NOT the fastest approach, and hope for better implementation. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit c21aeb7e5feca41005feac999d4cf446dc65a701 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:22:00 2018 +0800 staging: erofs: globalize prepare_bio and __submit_bio The unzip subsystem also uses these functions, let's export them to internal.h. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit a5908581d539ef37d1d390e3ad647216440c0ace Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:59 2018 +0800 staging: erofs: add erofs_allocpage This patch introduces an temporary _on-stack_ page pool to reuse the freed page directly as much as it can for better performance and release all pages at a time, it also slightly reduces the possibility of the potential memory allocation failure. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit bbd3e12ab2521a7c982ac0707bf8da7a0d22653b Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:58 2018 +0800 staging: erofs: add erofs_map_blocks_iter This patch introduces an iterable L2P mapping operation 'erofs_map_blocks_iter'. Compared with 'erofs_map_blocks', it avoids a number of redundant 'release and regrab' processes if they request the same meta page. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 70622cae335b9140e4358d7084c10cfb3da3301c Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:57 2018 +0800 staging: erofs: introduce pagevec for unzip subsystem For each compressed cluster, there is a straight-forward way of allocating a fixed or variable-sized (for VLE) array to record the corresponding file pages for its decompression if we decide to decompress these pages asynchronously (eg. read-ahead case), however it could take much extra on-heap memory compared with traditional uncompressed filesystems. This patch introduces a pagevec solution to reuse some allocated file page in the time-sharing approach storing parts of the array itself in order to minimize the extra memory overhead, thus only a constant and small-sized array used for booting the whole array itself up will be needed. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ff29dac3b4b402729a0b75b8724793158701b1f5 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:56 2018 +0800 staging: erofs: <linux/tagptr.h>: introduce tagged pointer Currently kernel has scattered tagged pointer usages hacked by hand in plain code, without a unique and portable functionset to highlight the tagged pointer itself and wrap these hacked code in order to clean up all over meaningless magic masks. Therefore, this patch introduces simple generic methods to fold tags into a pointer integer. It currently supports the last n bits of the pointer for tags, which can be selected by users. In addition, it will also be used for the upcoming EROFS filesystem, which heavily uses tagged pointer approach for high performance and reducing extra memory allocation. Link: https://en.wikipedia.org/wiki/Tagged_pointer Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 153f5ad87b67c45b453790ce206ced7c6cc62609 Author: Chao Yu <yuchao0@huawei.com> Date: Thu Jul 26 20:21:55 2018 +0800 staging: erofs: support tracepoint Add basic tracepoints for ->readpage{,s}, ->lookup, ->destroy_inode, fill_inode and map_blocks. Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 98dd1e3a3f42df26003ae86fd1767b03bef433a6 Author: Chao Yu <yuchao0@huawei.com> Date: Thu Jul 26 20:21:54 2018 +0800 staging: erofs: introduce error injection infrastructure This patch introduces error injection infrastructure, with it, we can inject error in any kernel exported common functions which erofs used, so that it can force erofs running into error paths, it turns out that tests can cover real rare paths more easily to find bugs. Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 220c7448cdc4c38e5177777de23793e653969904 Author: Chao Yu <yuchao0@huawei.com> Date: Thu Jul 26 20:21:53 2018 +0800 staging: erofs: support special inode This patch adds to support special inode, such as block dev, char, socket, pipe inode. Reviewed-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 7ed68385c49ac127e5baa07220d37cbf937e89d9 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:52 2018 +0800 staging: erofs: introduce xattr & acl support This implements xattr and acl functionalities. Inline and shared xattrs are introduced for flexibility. Specifically, if the same xattr occurs for many times in a large number of inodes or the value of a xattr is so large that it isn't suitable to be inlined, a shared xattr kept in the xattr meta will be used instead. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit db9bea5cf638b0683376b4118754dad0d444dd7c Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:51 2018 +0800 staging: erofs: update Kconfig and Makefile This commit adds Makefile and Kconfig for erofs, and updates Makefile and Kconfig files in the fs directory. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit afad040452afed7d552fb853c8613c6002e17ccb Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:50 2018 +0800 staging: erofs: add namei functions This commit adds functions that transfer names to inodes. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 4e7097e1a4a0170e8e51866a1242cc9556dcca5d Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:49 2018 +0800 staging: erofs: add directory operations This adds functions for directory, mainly readdir. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 421bfd9b50b8051aa451be073f6387bc678cccab Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:48 2018 +0800 staging: erofs: add inode operations This adds core functions to get, read an inode. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 944a5ab5bd4fc480e4099c8e5e97a6dca490a239 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:47 2018 +0800 staging: erofs: add raw address_space operations This commit adds functions for meta and raw data, and also provides address_space_operations for raw data access. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 8305bea76c9178ce211e5759061d10effbba958e Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:46 2018 +0800 staging: erofs: add super block operations This commit adds erofs super block operations, including (u)mount, remount_fs, show_options, statfs, in addition to some private icache management functions. Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit ae2a66470bd70480e4953ff12bc96902e0b59617 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:45 2018 +0800 staging: erofs: add erofs in-memory stuffs - erofs_sb_info: contains erofs-specific in-memory information. - erofs_vnode: contains vfs_inode and other fs-specific information. same as super block, the only one in-memory definition exists. - erofs_map_blocks plays a role in the file L2P mapping Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit dd48cb6b27dc6c5977e59e05c59209ddb68c2f51 Author: Gao Xiang <gaoxiang25@huawei.com> Date: Thu Jul 26 20:21:44 2018 +0800 staging: erofs: add on-disk layout This commit adds the on-disk layout header file of erofs. Note that the on-disk layout is still WIP, and some fields are reserved for the future use by design. Any comments are welcome. Thanks-to: Li Guifu <liguifu2@huawei.com> Thanks-to: Sun Qiuyang <sunqiuyang@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
133 lines
3.7 KiB
C
133 lines
3.7 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _LINUX_DAX_H
|
|
#define _LINUX_DAX_H
|
|
|
|
#include <linux/fs.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/radix-tree.h>
|
|
#include <asm/pgtable.h>
|
|
|
|
struct iomap_ops;
|
|
struct dax_device;
|
|
struct dax_operations {
|
|
/*
|
|
* direct_access: translate a device-relative
|
|
* logical-page-offset into an absolute physical pfn. Return the
|
|
* number of pages available for DAX at that pfn.
|
|
*/
|
|
long (*direct_access)(struct dax_device *, pgoff_t, long,
|
|
void **, pfn_t *);
|
|
/* copy_from_iter: required operation for fs-dax direct-i/o */
|
|
size_t (*copy_from_iter)(struct dax_device *, pgoff_t, void *, size_t,
|
|
struct iov_iter *);
|
|
};
|
|
|
|
extern struct attribute_group dax_attribute_group;
|
|
|
|
#if IS_ENABLED(CONFIG_DAX)
|
|
struct dax_device *dax_get_by_host(const char *host);
|
|
void put_dax(struct dax_device *dax_dev);
|
|
#else
|
|
static inline struct dax_device *dax_get_by_host(const char *host)
|
|
{
|
|
return NULL;
|
|
}
|
|
|
|
static inline void put_dax(struct dax_device *dax_dev)
|
|
{
|
|
}
|
|
#endif
|
|
|
|
struct writeback_control;
|
|
int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
|
|
#if IS_ENABLED(CONFIG_FS_DAX)
|
|
bool __bdev_dax_supported(struct block_device *bdev, int blocksize);
|
|
static inline bool bdev_dax_supported(struct block_device *bdev, int blocksize)
|
|
{
|
|
return __bdev_dax_supported(bdev, blocksize);
|
|
}
|
|
|
|
static inline struct dax_device *fs_dax_get_by_host(const char *host)
|
|
{
|
|
return dax_get_by_host(host);
|
|
}
|
|
|
|
static inline void fs_put_dax(struct dax_device *dax_dev)
|
|
{
|
|
put_dax(dax_dev);
|
|
}
|
|
|
|
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
|
|
int dax_writeback_mapping_range(struct address_space *mapping,
|
|
struct block_device *bdev, struct writeback_control *wbc);
|
|
#else
|
|
static inline bool bdev_dax_supported(struct block_device *bdev,
|
|
int blocksize)
|
|
{
|
|
return false;
|
|
}
|
|
|
|
static inline struct dax_device *fs_dax_get_by_host(const char *host)
|
|
{
|
|
return NULL;
|
|
}
|
|
|
|
static inline void fs_put_dax(struct dax_device *dax_dev)
|
|
{
|
|
}
|
|
|
|
static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
|
|
{
|
|
return NULL;
|
|
}
|
|
|
|
static inline int dax_writeback_mapping_range(struct address_space *mapping,
|
|
struct block_device *bdev, struct writeback_control *wbc)
|
|
{
|
|
return -EOPNOTSUPP;
|
|
}
|
|
#endif
|
|
|
|
int dax_read_lock(void);
|
|
void dax_read_unlock(int id);
|
|
struct dax_device *alloc_dax(void *private, const char *host,
|
|
const struct dax_operations *ops);
|
|
bool dax_alive(struct dax_device *dax_dev);
|
|
void kill_dax(struct dax_device *dax_dev);
|
|
void *dax_get_private(struct dax_device *dax_dev);
|
|
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
|
|
void **kaddr, pfn_t *pfn);
|
|
size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
|
|
size_t bytes, struct iov_iter *i);
|
|
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
|
|
void dax_write_cache(struct dax_device *dax_dev, bool wc);
|
|
bool dax_write_cache_enabled(struct dax_device *dax_dev);
|
|
|
|
ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
|
|
const struct iomap_ops *ops);
|
|
int dax_iomap_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
|
|
const struct iomap_ops *ops);
|
|
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
|
|
int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
|
|
pgoff_t index);
|
|
|
|
#ifdef CONFIG_FS_DAX
|
|
int __dax_zero_page_range(struct block_device *bdev,
|
|
struct dax_device *dax_dev, sector_t sector,
|
|
unsigned int offset, unsigned int length);
|
|
#else
|
|
static inline int __dax_zero_page_range(struct block_device *bdev,
|
|
struct dax_device *dax_dev, sector_t sector,
|
|
unsigned int offset, unsigned int length)
|
|
{
|
|
return -ENXIO;
|
|
}
|
|
#endif
|
|
|
|
static inline bool dax_mapping(struct address_space *mapping)
|
|
{
|
|
return mapping->host && IS_DAX(mapping->host);
|
|
}
|
|
|
|
#endif
|