Squashed commit of the following:
commit 37695a77521cfccbf92840cc13dcc4d8cb7dda96
Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com>
Date: Thu Feb 16 00:00:20 2023 +0800
raphael_defconfig: enable erofs highpri percpu kthread
commit 816e4801de2002f5f53e7cd2f7aea282755d5391
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Mar 6 15:48:21 2023 -0500
fs/(erofs || f2fs): drop WQ_UNBOUND
Due to asym arm64 latency regression on WQ_UNBOUND
commit d0e5cb53f102962d0d40ff12f548542d71f6340e
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Wed Feb 15 10:44:37 2023 -0500
erofs/zdata: modify set sched to use RR at high prio for lower latency
Fixes: bdd668d3b54202
commit afc1c08015966909a27c9d3d53d8796e80c3e4ef
Author: Sandeep Dhavale <dhavale@google.com>
Date: Wed Feb 8 06:53:49 2023 +0000
[WIP] BACKPORT: FROMLIST: erofs: add per-cpu threads for decompression
Using per-cpu thread pool we can reduce the scheduling latency compared
to workqueue implementation. With this patch scheduling latency and
variation is reduced as per-cpu threads are high priority kthread_workers.
The results were evaluated on arm64 Android devices running 5.10 kernel.
The table below shows resulting improvements of total scheduling latency
for the same app launch benchmark runs with 50 iterations. Scheduling
latency is the latency between when the task (workqueue kworker vs
kthread_worker) became eligible to run to when it actually started
running.
+-------------------------+-----------+----------------+---------+
| | workqueue | kthread_worker | diff |
+-------------------------+-----------+----------------+---------+
| Average (us) | 15253 | 2914 | -80.89% |
| Median (us) | 14001 | 2912 | -79.20% |
| Minimum (us) | 3117 | 1027 | -67.05% |
| Maximum (us) | 30170 | 3805 | -87.39% |
| Standard deviation (us) | 7166 | 359 | |
+-------------------------+-----------+----------------+---------+
Background: Boot times and cold app launch benchmarks are very
important to the android ecosystem as they directly translate to
responsiveness from user point of view. While erofs provides
a lot of important features like space savings, we saw some
performance penalty in cold app launch benchmarks in few scenarios.
Analysis showed that the significant variance was coming from the
scheduling cost while decompression cost was more or less the same.
Having per-cpu thread pool we can see from the above table that this
variation is reduced by ~80% on average. This problem was discussed
at LPC 2022. Link to LPC 2022 slides and
talk at [1]
[1] https://lpc.events/event/16/contributions/1338/
Link: https://lore.kernel.org/lkml/Y+DP6V9fZG7XPPGy@debian/
Change-Id: I454da5bc17f285d99047b93dc1fc70444f287156
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 354d97368e8ffd832a43f6aa0d7c43f52268ca80
Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com>
Date: Sat May 7 13:21:24 2022 +0800
sm8150: dtsi: remove barrier and discard mount options
commit 6c0b4a711ecb5b0e30c6115959b48af641e9b5bf
Author: pwnrazr <1644943+pwnrazr@users.noreply.github.com>
Date: Sat May 7 13:20:47 2022 +0800
Revert "arch: arm64: disable erofs"
This reverts commit fe6fe5ef6107fc245ca50cd38f585e580fe2fc59.
commit 515b1441ad6ac0f9e1c74013cd80e9b30065edc0
Author: kondors1995 <normandija1945@gmail.com>
Date: Wed Feb 8 16:43:29 2023 +0200
Revert "raphael_defconfig: Revert FBEv2 defconfig changes"
This reverts commit 97bb4a1d5d103804c72617481fca9b6cf93660a2.
commit c010e1a5176d73f3829ce49cfdb0fcc0ee5c777c
Author: Yue Hu <huyue2@coolpad.com>
Date: Thu Apr 7 13:05:43 2022 +0800
erofs: do not prompt for risk any more when using big pcluster
The big pcluster feature has been merged for a year, it has been mostly
stable now.
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220407050505.12683-1-huyue2@coolpad.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
commit b135290ae7af3f5f7b69e24c6ca678c4f6572cf2
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:23:06 2022 -0400
erofs: Squashed revert of some recent backports:
Keep out of release branch until
d71eb1da8e8b59a7072c51ce48175e159ecfd79a is fixed, and also readmore
decompress strategy is introduced.
commit b9494371e2493f1a8ccc18b1c80f67867f6f623a
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:49 2022 -0400
Revert "erofs: iomap support for non-tailpacking DIO"
This reverts commit 804ddc92b769a9cc9926d0262725e6330d0f0a76.
commit 0649a6ed5e759857aabc334abeddacbe4eac7859
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:41 2022 -0400
Revert "erofs: adapt 3f4e33b91a28 to our tree"
This reverts commit 016f1ffa36da74ab67ed99abd474a0b2da5133eb.
commit a3704a5a79990f75c8336c9001939db6e6d21181
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:33 2022 -0400
Revert "erofs: add support for the full decompressed length"
This reverts commit a4a195b954114aeb741cf4f8b14256ed92e7c545.
commit 5a506fe78d7624f1a94e60d0e3d7113ae6934ea7
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:27 2022 -0400
Revert "erofs: add fiemap support with iomap"
This reverts commit 07577933c3fb397791f113ad36fac7a061385826.
commit dd93cf9efb3d1f9608780c44a50a860eb9921cf4
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:16 2022 -0400
Revert "erofs: introduce chunk-based file on-disk format"
This reverts commit 690f4dc6d3b27ed6278b8fbae20273883f616e56.
commit a1846fe6257df43564f42eb153131796f3fd84ed
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:08 2022 -0400
Revert "erofs: support reading chunk-based uncompressed files"
This reverts commit 5bd83bfc55b6169af5bbf3c0ba4528577c2fa1ff.
commit 3e1c2530db00b6605d8db09e207cb3633e61cdba
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:22:03 2022 -0400
Revert "erofs: fix double free of 'copied'"
This reverts commit c608a6f861e0d457d6c9a5905e8b3d928e672075.
commit 7a9e0f351f8d41a01a0763316bbd4b6ace94bea0
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:21:52 2022 -0400
Revert "erofs: fix misbehavior of unsupported chunk format check"
This reverts commit 751e7c533e451b3c6a51f7d2a69224cca39e8c20.
commit 37b05816e45d519643dd9d162b827311abf3b034
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:21:44 2022 -0400
Revert "erofs: get compression algorithms directly on mapping"
This reverts commit 98b09cde747826f6fe3aae50eb05659f7f2803f7.
commit de74ca4af181a35ac037a44f07cf6a7e55e0f127
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:21:35 2022 -0400
Revert "erofs: introduce the secondary compression head"
This reverts commit feea4ee667bf5d5fa2c6d0c5f57697476dce7ca7.
commit dda6e8eaddd3203cfafd6c82d2e751f2e6d16766
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:21:29 2022 -0400
Revert "erofs: clean up z_erofs_extent_lookback"
This reverts commit c08dbda40a4f3016ee6c60ae2a19e3ecc518361c.
commit 2e5fd527a76eba733464b0ba71fe92abc839b62b
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon Jun 6 13:21:23 2022 -0400
Revert "erofs: clean up erofs_map_blocks tracepoints"
This reverts commit d71eb1da8e8b59a7072c51ce48175e159ecfd79a.
commit ed6e7f36515d6d80c75c4d0803b636e17f328a6c
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Thu Dec 9 09:29:18 2021 +0800
erofs: clean up erofs_map_blocks tracepoints
Since the new type of chunk-based files is introduced, there is no
need to leave flatmode tracepoints.
Rename to erofs_map_blocks instead.
Link: https://lore.kernel.org/r/20211209012918.30337-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 525147ad9beef7e521c1667509db763e970c06d3
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Fri Mar 11 02:27:42 2022 +0800
erofs: clean up z_erofs_extent_lookback
Avoid the unnecessary tail recursion since it can be converted into
a loop directly in order to prevent potential stack overflow.
It's a pretty straightforward conversion.
Link: https://lore.kernel.org/r/20220310182743.102365-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit db45bcfb35a2cd8d49e159c0cc70635b713183a4
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Mon Oct 18 00:57:21 2021 +0800
erofs: introduce the secondary compression head
Previously, for each HEAD lcluster, it can be either HEAD or PLAIN
lcluster to indicate whether the whole pcluster is compressed or not.
In this patch, a new HEAD2 head type is introduced to specify another
compression algorithm other than the primary algorithm for each
compressed file, which can be used for upcoming LZMA compression and
LZ4 range dictionary compression for various data patterns.
It has been stayed in the EROFS roadmap for years. Complete it now!
Link: https://lore.kernel.org/r/20211017165721.2442-1-xiang@kernel.org
Reviewed-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit f0fe9e97d03ed484a51f764373ad0c5941949869
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Sat Oct 9 04:08:37 2021 +0800
erofs: get compression algorithms directly on mapping
Currently, z_erofs_map_blocks_iter() returns whether extents are
compressed or not, and the decompression frontend gets the specific
algorithms then.
It works but not quite well in many aspests, for example:
- The decompression frontend has to deal with whether extents are
compressed or not again and lookup the algorithms if compressed.
It's duplicated and too detailed about the on-disk mapping.
- A new secondary compression head will be introduced later so that
each file can have 2 compression algorithms at most for different
type of data. It could increase the complexity of the decompression
frontend if still handled in this way;
- A new readmore decompression strategy will be introduced to get
better performance for much bigger pcluster and lzma, which needs
the specific algorithm in advance as well.
Let's look up compression algorithms in z_erofs_map_blocks_iter()
directly instead.
Link: https://lore.kernel.org/r/20211008200839.24541-2-xiang@kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 588fc2156404c552d4c2c7bcc5def820966a1ba1
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Wed Sep 22 17:51:41 2021 +0800
erofs: fix misbehavior of unsupported chunk format check
Unsupported chunk format should be checked with
"if (vi->chunkformat & ~EROFS_CHUNK_FORMAT_ALL)"
Found when checking with 4k-byte blockmap (although currently mkfs
uses inode chunk indexes format by default.)
Link: https://lore.kernel.org/r/20210922095141.233938-1-hsiangkao@linux.alibaba.com
Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files")
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 613122535bafaabb0e58a9c347c5b6f1b8e6fa91
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Wed Aug 25 20:07:57 2021 +0800
erofs: fix double free of 'copied'
Dan reported a new smatch warning [1]
"fs/erofs/inode.c:210 erofs_read_inode() error: double free of 'copied'"
Due to new chunk-based format handling logic, the error path can be
called after kfree(copied).
Set "copied = NULL" after kfree(copied) to fix this.
[1] https://lore.kernel.org/r/202108251030.bELQozR7-lkp@intel.com
Link: https://lore.kernel.org/r/20210825120757.11034-1-hsiangkao@linux.alibaba.com
Fixes: c5aa903a59db ("erofs: support reading chunk-based uncompressed files")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 7b648f684ea7c99deab7278f0c2cbbf74797a56d
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Fri Aug 20 18:00:19 2021 +0800
erofs: support reading chunk-based uncompressed files
Add runtime support for chunk-based uncompressed files
described in the previous patch.
Link: https://lore.kernel.org/r/20210820100019.208490-2-hsiangkao@linux.alibaba.com
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit d9737546275a3c460177a3ce9e01096bc3cfc3ad
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Fri Aug 20 18:00:18 2021 +0800
erofs: introduce chunk-based file on-disk format
Currently, uncompressed data except for tail-packing inline is
consecutive on disk.
In order to support chunk-based data deduplication, add a new
corresponding inode data layout.
In the future, the data source of chunks can be either (un)compressed.
Link: https://lore.kernel.org/r/20210820100019.208490-1-hsiangkao@linux.alibaba.com
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 47f6bed39a7a83aa59be667657cba886dbd4b79b
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Fri Aug 13 13:29:31 2021 +0800
erofs: add fiemap support with iomap
This adds fiemap support for both uncompressed files and compressed
files by using iomap infrastructure.
Link: https://lore.kernel.org/r/20210813052931.203280-3-hsiangkao@linux.alibaba.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 82cc95ee585c9b033a43b0564173d4c444e3a4ac
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Wed Aug 18 23:22:31 2021 +0800
erofs: add support for the full decompressed length
Previously, there is no need to get the full decompressed length since
EROFS supports partial decompression. However for some other cases
such as fiemap, the full decompressed length is necessary for iomap to
make it work properly.
This patch adds a way to get the full decompressed length. Note that
it takes more metadata overhead and it'd be avoided if possible in the
performance sensitive scenario.
Link: https://lore.kernel.org/r/20210818152231.243691-1-hsiangkao@linux.alibaba.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 8ff30ee6aaa1130bc26af4a98a818d91820c0bdb
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Thu May 12 12:08:04 2022 -0400
erofs: adapt 3f4e33b91a28 to our tree
commit 71e2f8865698e382349a16d8f90e5d74f935ff2a
Author: Huang Jianan <huangjianan@oppo.com>
Date: Thu Aug 5 08:35:59 2021 +0800
erofs: iomap support for non-tailpacking DIO
Add iomap support for non-tailpacking uncompressed data in order to
support DIO and DAX.
Direct I/O is useful in certain scenarios for uncompressed files.
For example, double pagecache can be avoid by direct I/O when
loop device is used for uncompressed files containing upper layer
compressed filesystem.
This adds iomap DIO support for non-tailpacking cases first and
tail-packing inline files are handled in the follow-up patch.
Link: https://lore.kernel.org/r/20210805003601.183063-2-hsiangkao@linux.alibaba.com
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 8bc571a229c3701405ac47f689db283ac99f2b2d
Author: Goldwyn Rodrigues <rgoldwyn@suse.com>
Date: Fri Aug 30 12:09:24 2019 -0500
fs: export generic_file_buffered_read()
Export generic_file_buffered_read() to be used to supplement incomplete
direct reads.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
commit 34c8cbbc7b932ac50e90da6e838524fd1f162aca
Author: Dan Williams <dan.j.williams@intel.com>
Date: Wed Mar 7 15:26:44 2018 -0800
fs, dax: prepare for dax-specific address_space_operations
In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Define some generic VFS aops
helpers for dax. These noop implementations are there in the dax case to
prevent the VFS from falling back to operations with page-cache
assumptions, dax_writeback_mapping_range() may not be referenced in the
FS_DAX=n case.
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Matthew Wilcox <mawilcox@microsoft.com>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
commit b0da008763834f165e8a055e011f223b3981316d
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date: Sun Oct 1 17:55:54 2017 -0400
iomap: Switch from blkno to disk offset
Replace iomap->blkno, the sector number, with iomap->addr, the disk
offset in bytes. For invalid disk offsets, use the special value
IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK.
This allows to use iomap for mappings which are not block aligned, such
as inline data on ext4.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> # iomap, xfs
Reviewed-by: Jan Kara <jack@suse.cz>
commit b74997cce993dd0408a0beeb36bd28652e272108
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue Nov 28 15:39:51 2017 -0500
idr: Rename idr_for_each_entry_ext
Most places in the kernel that we need to distinguish functions by the
type of their arguments, we use '_ul' as a suffix for the unsigned long
variant, not '_ext'. Also add kernel-doc.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
commit a562faeba73cfb13de1f278c95be606faa3e4f21
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue Nov 28 10:14:27 2017 -0500
idr: Add idr_alloc_u32 helper
All current users of idr_alloc_ext() actually want to allocate a u32
and idr_alloc_u32() fits their needs better.
Like idr_get_next(), it uses a 'nextid' argument which serves as both
a pointer to the start ID and the assigned ID (instead of a separate
minimum and pointer-to-assigned-ID argument). It uses a 'max' argument
rather than 'end' because the semantics that idr_alloc has for 'end'
don't work well for unsigned types.
Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
it as __must_check to help callers use it correctly. Include copious
kernel-doc. Chris Mi <chrism@mellanox.com> has promised to contribute
test-cases for idr_alloc_u32.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
commit 4b24e4564260899c64b9532440a9b5545dbfe7f9
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue Apr 10 16:36:48 2018 -0700
fscache: use appropriate radix tree accessors
Don't open-code accesses to data structure internals.
Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 7469480be01c3394807cbd0991f06b8d6f2d4403
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue Apr 10 16:36:44 2018 -0700
export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty(). Export
it from buffer.c instead.
Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit c53045287025992bc775081dbab63ac926a597e8
Author: Matthew Wilcox <mawilcox@microsoft.com>
Date: Tue Apr 10 16:36:28 2018 -0700
radix tree: use GFP_ZONEMASK bits of gfp_t for flags
Patch series "XArray", v9. (First part thereof).
This patchset is, I believe, appropriate for merging for 4.17. It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.
This conversion keeps the radix tree and XArray data structures in sync
at all times. That allows us to convert the page cache one function at
a time and should allow for easier bisection. Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines. I have changes planned to the XArray data
structure, but those will happen in future patches.
Improvements the XArray has over the radix tree:
- The radix tree provides operations like other trees do; 'insert' and
'delete'. But what most users really want is an automatically
resizing array, and so it makes more sense to give users an API that
is like an array -- 'load' and 'store'. We still have an 'insert'
operation for users that really want that semantic.
- The XArray considers locking as part of its API. This simplifies a
lot of users who formerly had to manage their own locking just for
the radix tree. It also improves code generation as we can now tell
RCU that we're holding a lock and it doesn't need to generate as much
fencing code. The other advantage is that tree nodes can be moved
(not yet implemented).
- GFP flags are now parameters to calls which may need to allocate
memory. The radix tree forced users to decide what the allocation
flags would be at creation time. It's much clearer to specify them at
allocation time.
- Memory is not preloaded; we don't tie up dozens of pages on the off
chance that the slab allocator fails. Instead, we drop the lock,
allocate a new node and retry the operation. We have to convert all
the radix tree, IDA and IDR preload users before we can realise this
benefit, but I have not yet found a user which cannot be converted.
- The XArray provides a cmpxchg operation. The radix tree forces users
to roll their own (and at least four have).
- Iterators take a 'max' parameter. That simplifies many users and will
reduce the amount of iteration done.
- Iteration can proceed backwards. We only have one user for this, but
since it's called as part of the pagefault readahead algorithm, that
seemed worth mentioning.
- RCU-protected pointers are not exposed as part of the API. There are
some fun bugs where the page cache forgets to use rcu_dereference()
in the current codebase.
- Value entries gain an extra bit compared to radix tree exceptional
entries. That gives us the extra bit we need to put huge page swap
entries in the page cache.
- Some iterators now take a 'filter' argument instead of having
separate iterators for tagged/untagged iterations.
The page cache is improved by this:
- Shorter, easier to read code
- More efficient iterations
- Reduction in size of struct address_space
- Fewer walks from the top of the data structure; the XArray API
encourages staying at the leaf node and conducting operations there.
This patch (of 8):
None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.
Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit c95250f9f545568b87775f1a2a48d412203161f7
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon May 16 10:45:14 2022 -0400
Revert "erofs: compression fixes"
This reverts commit 208dabff2d5e3e616a86df8bdba814d54b1a8a1f.
Fixes a deadlock when fix shrinking erofs slab.
commit d07627505cd871bb1a539377434dede2f4a18d9c
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Mon May 16 09:41:14 2022 -0400
Revert "erofs: fixes for compilation"
This reverts commit c7bf11979051cda0e7b37857289503fa4831c549.
commit 7846d0f267ba3572570917e4880d60c79939bf5c
Author: Hongyu Jin <hongyu.jin@unisoc.com>
Date: Fri Apr 1 19:55:27 2022 +0800
erofs: fix use-after-free of on-stack io[]
The root cause is the race as follows:
Thread #1 Thread #2(irq ctx)
z_erofs_runqueue()
struct z_erofs_decompressqueue io_A[];
submit bio A
z_erofs_decompress_kickoff(,,1)
z_erofs_decompressqueue_endio(bio A)
z_erofs_decompress_kickoff(,,-1)
spin_lock_irqsave()
atomic_add_return()
io_wait_event() -> pending_bios is already 0
[end of function]
wake_up_locked(io_A[]) // crash
Referenced backtrace in kernel 5.4:
[ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4
[ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1
[ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
[ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
[ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
[ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
[ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
[ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
[ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
[ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
[ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
[ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
[ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
[ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)
Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 9fa705504bf016a360c10edc3c9c5cbf8d870a78
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Thu May 5 22:40:43 2022 -0400
erofs: extend 3812dc21ec
commit 4cda8c8c3d0ea4b3cb0f660db01697b50f7bfddc
Author: Yue Hu <huyue2@yulong.com>
Date: Thu Oct 14 14:57:44 2021 +0800
erofs: remove the fast path of per-CPU buffer decompression
As Xiang mentioned, such path has no real impact to our current
decompression strategy, remove it directly. Also, update the return
value of z_erofs_lz4_decompress() to 0 if success to keep consistent
with LZMA which will return 0 as well for that case.
Link: https://lore.kernel.org/r/20211014065744.1787-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 20122adf7721eff6c6ff90db545e0597501d942f
Author: Yue Hu <huyue2@yulong.com>
Date: Tue Sep 14 11:59:15 2021 +0800
erofs: clear compacted_2b if compacted_4b_initial > totalidx
Currently, the whole indexes will only be compacted 4B if
compacted_4b_initial > totalidx. So, the calculated compacted_2b
is worthless for that case. It may waste CPU resources.
No need to update compacted_4b_initial as mkfs since it's used to
fulfill the alignment of the 1st compacted_2b pack and would handle
the case above.
We also need to clarify compacted_4b_end here. It's used for the
last lclusters which aren't fitted in the previous compacted_2b
packs.
Some messages are from Xiang.
Link: https://lore.kernel.org/r/20210914035915.1190-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[ Gao Xiang: it's enough to use "compacted_4b_initial < totalidx". ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 3243783e85d10ccc00b9e8cb37960ed1fc1e9fef
Author: Yue Hu <huyue2@yulong.com>
Date: Tue Aug 10 15:24:16 2021 +0800
erofs: remove the mapping parameter from erofs_try_to_free_cached_page()
The mapping is not used at all, remove it and update related code.
Link: https://lore.kernel.org/r/20210810072416.1392-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 2936d3798b6c340459813a0eeb2409a4cb34e44f
Author: Yue Hu <huyue2@yulong.com>
Date: Tue Aug 10 14:54:50 2021 +0800
erofs: directly use wrapper erofs_page_is_managed() when shrinking
We already have the wrapper function to identify managed page.
Link: https://lore.kernel.org/r/20210810065450.1320-1-zbestahu@gmail.com
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
commit 09b3effb67cdec2ce718d83a363c0a2df5f3d372
Author: Yue Hu <huyue2@yulong.com>
Date: Mon Apr 19 18:26:23 2021 +0800
erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()
No any behavior to variable occupied in z_erofs_attach_page() which
is only caller to z_erofs_pagevec_enqueue().
Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com
Signed-off-by: Yue Hu <huyue2@yulong.com>
Reviewed-by: Gao Xiang <xiang@kernel.org>
Signed-off-by: Gao Xiang <xiang@kernel.org>
commit b5b28aefcf024c86c3f930293ba36482f96faf34
Author: Gao Xiang <xiang@kernel.org>
Date: Mon May 10 14:47:15 2021 +0800
erofs: fix 1 lcluster-sized pcluster for big pcluster
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:
HEAD HEAD / PLAIN lcluster type
____________ ____________
|_:__________|_________:__| file data (uncompressed)
. .
.____________.
|____________| pcluster data (compressed)
Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.
It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.
[1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org
Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org
Fixes: cec6e93beadf ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
commit 2cfa0bcf32db1431e18d636e0ff5c592768b9620
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:27 2021 +0800
erofs: enable big pcluster feature
Enable COMPR_CFGS and BIG_PCLUSTER since the implementations are
all settled properly.
Link: https://lore.kernel.org/r/20210407043927.10623-11-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit d75144d8d0395bca0a1a629a3b9ab6a95112a083
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:26 2021 +0800
erofs: support decompress big pcluster for lz4 backend
Prior to big pcluster, there was only one compressed page so it'd
easy to map this. However, when big pcluster is enabled, more work
needs to be done to handle multiple compressed pages. In detail,
- (maptype 0) if there is only one compressed page + no need
to copy inplace I/O, just map it directly what we did before;
- (maptype 1) if there are more compressed pages + no need to
copy inplace I/O, vmap such compressed pages instead;
- (maptype 2) if inplace I/O needs to be copied, use per-CPU
buffers for decompression then.
Another thing is how to detect inplace decompression is feasable or
not (it's still quite easy for non big pclusters), apart from the
inplace margin calculation, inplace I/O page reusing order is also
needed to be considered for each compressed page. Currently, if the
compressed page is the xth page, it shouldn't be reused as [0 ...
nrpages_out - nrpages_in + x], otherwise a full copy will be triggered.
Although there are some extra optimization ideas for this, I'd like
to make big pcluster work correctly first and obviously it can be
further optimized later since it has nothing with the on-disk format
at all.
Link: https://lore.kernel.org/r/20210407043927.10623-10-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit f344f71c42af2866c748ae22e1b133a02594b367
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:25 2021 +0800
erofs: support parsing big pcluster compact indexes
Different from non-compact indexes, several lclusters are packed
as the compact form at once and an unique base blkaddr is stored for
each pack, so each lcluster index would take less space on avarage
(e.g. 2 bytes for COMPACT_2B.) btw, that is also why BIG_PCLUSTER
switch should be consistent for compact head0/1.
Prior to big pcluster, the size of all pclusters was 1 lcluster.
Therefore, when a new HEAD lcluster was scanned, blkaddr would be
bumped by 1 lcluster. However, that way doesn't work anymore for
big pcluster since we actually don't know the compressed size of
pclusters in advance (before reading CBLKCNT lcluster).
So, instead, let blkaddr of each pack be the first pcluster blkaddr
with a valid CBLKCNT, in detail,
1) if CBLKCNT starts at the pack, this first valid pcluster is
itself, e.g.
_____________________________________________________________
|_CBLKCNT0_|_NONHEAD_| .. |_HEAD_|_CBLKCNT1_| ... |_HEAD_| ...
^ = blkaddr base ^ += CBLKCNT0 ^ += CBLKCNT1
2) if CBLKCNT doesn't start at the pack, the first valid pcluster
is the next pcluster, e.g.
_________________________________________________________
| NONHEAD_| .. |_HEAD_|_CBLKCNT0_| ... |_HEAD_|_HEAD_| ...
^ = blkaddr base ^ += CBLKCNT0
^ += 1
When a CBLKCNT is found, blkaddr will be increased by CBLKCNT
lclusters, or a new HEAD is found immediately, bump blkaddr by 1
instead (see the picture above.)
Also noted if CBLKCNT is the end of the pack, instead of storing
delta1 (distance of the next HEAD lcluster) as normal NONHEADs,
it still uses the compressed block count (delta0) since delta1
can be calculated indirectly but the block count can't.
Adjust decoding logic to fit big pcluster compact indexes as well.
Link: https://lore.kernel.org/r/20210407043927.10623-9-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 7af2a5cf065073d6f43298b2c96676f9315709d5
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:24 2021 +0800
erofs: support parsing big pcluster compress indexes
When INCOMPAT_BIG_PCLUSTER sb feature is enabled, legacy compress indexes
will also have the same on-disk header compact indexes to keep per-file
configurations instead of leaving it zeroed.
If ADVISE_BIG_PCLUSTER is set for a file, CBLKCNT will be loaded for each
pcluster in this file by parsing 1st non-head lcluster.
Link: https://lore.kernel.org/r/20210407043927.10623-8-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 81a0c5100c6b09b91b7cfdad429fc66d65335be2
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:23 2021 +0800
erofs: adjust per-CPU buffers according to max_pclusterblks
Adjust per-CPU buffers on demand since big pcluster definition is
available. Also, bail out unsupported pcluster size according to
Z_EROFS_PCLUSTER_MAX_SIZE.
Link: https://lore.kernel.org/r/20210407043927.10623-7-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 56612c78a9aeefc38d6b9bd7a6fef06eebe0c4b6
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:22 2021 +0800
erofs: add big physical cluster definition
Big pcluster indicates the size of compressed data for each physical
pcluster is no longer fixed as block size, but could be more than 1
block (more accurately, 1 logical pcluster)
When big pcluster feature is enabled for head0/1, delta0 of the 1st
non-head lcluster index will keep block count of this pcluster in
lcluster size instead of 1. Or, the compressed size of pcluster
should be 1 lcluster if pcluster has no non-head lcluster index.
Also note that BIG_PCLUSTER feature reuses COMPR_CFGS feature since
it depends on COMPR_CFGS and will be released together.
Link: https://lore.kernel.org/r/20210407043927.10623-6-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit a67309917444753f1cebfee2d2503cf68269e54a
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:21 2021 +0800
erofs: fix up inplace I/O pointer for big pcluster
When picking up inplace I/O pages, it should be traversed in reverse
order in aligned with the traversal order of file-backed online pages.
Also, index should be updated together when preloading compressed pages.
Previously, only page-sized pclustersize was supported so no problem
at all. Also rename `compressedpages' to `icpage_ptr' to reflect its
functionality.
Link: https://lore.kernel.org/r/20210407043927.10623-5-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 8fabf77d1a435d68b2bbb89c51f8351ef8efed26
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:20 2021 +0800
erofs: introduce physical cluster slab pools
Since multiple pcluster sizes could be used at once, the number of
compressed pages will become a variable factor. It's necessary to
introduce slab pools rather than a single slab cache now.
This limits the pclustersize to 1M (Z_EROFS_PCLUSTER_MAX_SIZE), and
get rid of the obsolete EROFS_FS_CLUSTER_PAGE_LIMIT, which has no
use now.
Link: https://lore.kernel.org/r/20210407043927.10623-4-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit c9b891a3fd81d315815f496f1282c95e98507812
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Sat Apr 10 03:06:30 2021 +0800
erofs: introduce multipage per-CPU buffers
To deal the with the cases which inplace decompression is infeasible
for some inplace I/O. Per-CPU buffers was introduced to get rid of page
allocation latency and thrash for low-latency decompression algorithms
such as lz4.
For the big pcluster feature, introduce multipage per-CPU buffers to
keep such inplace I/O pclusters temporarily as well but note that
per-CPU pages are just consecutive virtually.
When a new big pcluster fs is mounted, its max pclustersize will be
read and per-CPU buffers can be growed if needed. Shrinking adjustable
per-CPU buffers is more complex (because we don't know if such size
is still be used), so currently just release them all when unloading.
Link: https://lore.kernel.org/r/20210409190630.19569-1-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 6751c7549b38cfe2044fc3d6e03c25c0067e700d
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Apr 7 12:39:18 2021 +0800
erofs: reserve physical_clusterbits[]
Formal big pcluster design is actually more powerful / flexable than
the previous thought whose pclustersize was fixed as power-of-2 blocks,
which was obviously inefficient and space-wasting. Instead, pclustersize
can now be set independently for each pcluster, so various pcluster
sizes can also be used together in one file if mkfs wants (for example,
according to data type and/or compression ratio).
Let's get rid of previous physical_clusterbits[] setting (also notice
that corresponding on-disk fields are still 0 for now). Therefore,
head1/2 can be used for at most 2 different algorithms in one file and
again pclustersize is now independent of these.
Link: https://lore.kernel.org/r/20210407043927.10623-2-xiang@kernel.org
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 7c717bd2fb96a7ee82346bc88ddd28c5812c689d
Author: Ruiqi Gong <gongruiqi1@huawei.com>
Date: Wed Mar 31 05:39:20 2021 -0400
erofs: Clean up spelling mistakes found in fs/erofs
zmap.c: s/correspoinding/corresponding
zdata.c: s/endding/ending
Link: https://lore.kernel.org/r/20210331093920.31923-1-gongruiqi1@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Ruiqi Gong <gongruiqi1@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 44f277dee13de691fe1fc483b55b4bc8ade3da36
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Mon Mar 29 18:00:12 2021 +0800
erofs: add on-disk compression configurations
Add a bitmap for available compression algorithms and a variable-sized
on-disk table for compression options in preparation for upcoming big
pcluster and LZMA algorithm, which follows the end of super block.
To parse the compression options, the bitmap is scanned one by one.
For each available algorithm, there is data followed by 2-byte `length'
correspondingly (it's enough for most cases, or entire fs blocks should
be used.)
With such available algorithm bitmap, kernel itself can also refuse to
mount such filesystem if any unsupported compression algorithm exists.
Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER.
Link: https://lore.kernel.org/r/20210329100012.12980-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit e43a280cd5ca073e9d8cfa0471cdabf8f8500181
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Mon Mar 29 09:23:07 2021 +0800
erofs: introduce on-disk lz4 fs configurations
Introduce z_erofs_lz4_cfgs to store all lz4 configurations.
Currently it's only max_distance, but will be used for new
features later.
Link: https://lore.kernel.org/r/20210329012308.28743-4-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit d4108bf277b411bfdfa0eb12c2172b4035471d8b
Author: Huang Jianan <huangjianan@oppo.com>
Date: Mon Mar 29 09:23:06 2021 +0800
erofs: support adjust lz4 history window size
lz4 uses LZ4_DISTANCE_MAX to record history preservation. When
using rolling decompression, a block with a higher compression
ratio will cause a larger memory allocation (up to 64k). It may
cause a large resource burden in extreme cases on devices with
small memory and a large number of concurrent IOs. So appropriately
reducing this value can improve performance.
Decreasing this value will reduce the compression ratio (except
when input_size <LZ4_DISTANCE_MAX). But considering that erofs
currently only supports 4k output, reducing this value will not
significantly reduce the compression benefits.
The maximum value of LZ4_DISTANCE_MAX defined by lz4 is 64k, and
we can only reduce this value. For the old kernel, it just can't
reduce the memory allocation during rolling decompression without
affecting the decompression result.
Link: https://lore.kernel.org/r/20210329012308.28743-3-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
[ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 89a30917b8f584f34216b053ff5e4b8e1fa1a81a
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Mon Mar 29 09:23:05 2021 +0800
erofs: introduce erofs_sb_has_xxx() helpers
Introduce erofs_sb_has_xxx() to make long checks short, especially
for later big pcluster & LZMA features.
Link: https://lore.kernel.org/r/20210329012308.28743-2-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 83849318acff8125846f2447ed318f80db4dde38
Author: Yue Hu <huyue2@yulong.com>
Date: Thu Mar 25 15:10:08 2021 +0800
erofs: don't use erofs_map_blocks() any more
Currently, erofs_map_blocks() will be called only from
erofs_{bmap, read_raw_page} which are all for uncompressed files.
So, the compression branch in erofs_map_blocks() is pointless. Let's
remove it and use erofs_map_blocks_flatmode() directly. Also update
related comments.
Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit dd3b7a71fb79a620a8df1138d74c990df27e04a5
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Mon Mar 22 02:32:27 2021 +0800
erofs: complete a missing case for inplace I/O
Add a missing case which could cause unnecessary page allocation but
not directly use inplace I/O instead, which increases runtime extra
memory footprint.
The detail is, considering an online file-backed page, the right half
of the page is chosen to be cached (e.g. the end page of a readahead
request) and some of its data doesn't exist in managed cache, so the
pcluster will be definitely kept in the submission chain. (IOWs, it
cannot be decompressed without I/O, e.g., due to the bypass queue).
Currently, DELAYEDALLOC/TRYALLOC cases can be downgraded as NOINPLACE,
and stop online pages from inplace I/O. After this patch, unneeded page
allocations won't be observed in pickup_page_for_submission() then.
Link: https://lore.kernel.org/r/20210321183227.5182-1-hsiangkao@aol.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 2195652f604a78eff0b808c94f7c31c0648d42e8
Author: Huang Jianan <huangjianan@oppo.com>
Date: Wed Mar 17 11:54:47 2021 +0800
erofs: use workqueue decompression for atomic contexts only
z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.
Link: https://lore.kernel.org/r/20210317035448.13921-2-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 50a12c462dbc5e3e4d14dc427392fc8e571b1b0b
Author: Huang Jianan <huangjianan@oppo.com>
Date: Tue Mar 16 11:15:14 2021 +0800
erofs: avoid memory allocation failure during rolling decompression
Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.
In the long term, we might consider adding another !Uptodate case
for such case.
Link: https://lore.kernel.org/r/20210316031515.90954-1-huangjianan@oppo.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Guo Weichao <guoweichao@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
commit 5a664357076596a3af1100413bafb00a88dc5ef2
Author: kondors1995 <normandija1945@gmail.com>
Date: Mon May 9 16:44:49 2022 +0000
raphael_defconfig: Enable EROFS
commit 2409ea765730e7ca72fcc71dc3989eb37306ed81
Author: Tom Levy <tomlevy93@gmail.com>
Date: Tue Jul 16 16:30:24 2019 -0700
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
Fix a few spelling and grammar errors, and two places where fast/safe in
the documentation did not match the function.
Link: http://lkml.kernel.org/r/20190321014452.13297-1-tomlevy93@gmail.com
Signed-off-by: Tom Levy <tomlevy93@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Kosina <trivial@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live>
commit 416572f0ce1a90146cb73dd5ea3667899d0f8241
Author: John Galt <johngaltfirstrun@gmail.com>
Date: Tue May 3 16:09:48 2022 -0400
erofs: compression fixes
commit 8af69e641af0cd017664fbf2fbd9ce2509b2b8dc
Author: Luan Cachoroski Halaiko <luhalaiko@gmail.com>
Date: Tue Feb 8 20:20:47 2022 -0300
erofs: fixes for compilation
Signed-off-by: Luan Cachoroski Halaiko <luhalaiko@gmail.com>
commit ad81e37ce0d0af5bdb0115a7eccd673c03d293f0
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Wed Dec 9 20:37:17 2020 +0800
erofs: force inplace I/O under low memory scenario
Try to forcely switch to inplace I/O under low memory scenario in
order to avoid direct memory reclaim due to cached page allocation.
Link: https://lore.kernel.org/r/20201209123717.12430-1-hsiangkao@aol.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I8ea2d3b59c68125271f66853cf5dc6ca39e7aaa9
commit e4018facd91f25eb223b94416d1b64f641618577
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Tue Dec 8 17:58:34 2020 +0800
erofs: simplify try_to_claim_pcluster()
simplify try_to_claim_pcluster() by directly using cmpxchg() here
(the retry loop caused more overhead.) Also, move the chain loop
detection in and rename it to z_erofs_try_to_claim_pcluster().
Link: https://lore.kernel.org/r/20201208095834.3133565-3-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I8d091ff44123b099ef199eaa4200a00b8854623f
commit f28d114732f644b4a6445316095db1f0e818472f
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Tue Dec 8 17:58:33 2020 +0800
erofs: insert to managed cache after adding to pcl
Previously, it could be some concern to call add_to_page_cache_lru()
with page->mapping == Z_EROFS_MAPPING_STAGING (!= NULL).
In contrast, page->private is used instead now, so partially revert
commit 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper")
with some adaption for simplicity.
Link: https://lore.kernel.org/r/20201208095834.3133565-2-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: If250d62b47083649e96d0937eb1990b6c84d768f
commit 1a79fe1a476ae08ed0609618951fe863df0ac03a
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Tue Dec 8 17:58:32 2020 +0800
erofs: get rid of magical Z_EROFS_MAPPING_STAGING
Previously, we played around with magical page->mapping for short-lived
temporary pages since we need to identify different types of pages in
the same pcluster but both invalidated and short-lived temporary pages
can have page->mapping == NULL. It was considered as safe because that
temporary pages are all non-LRU / non-movable pages.
This patch tends to use specific page->private to identify short-lived
pages instead so it won't rely on page->mapping anymore. Details are
described in "compress.h" as well.
Link: https://lore.kernel.org/r/20201208095834.3133565-1-hsiangkao@redhat.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: I2c8650e80cb6016ed828d04f89f8bd3512ca3fb2
commit a50789da7af81e73a8cb0081e788cea5543eff5c
Author: Vladimir Zapolskiy <vladimir@tuxera.com>
Date: Fri Oct 30 14:28:39 2020 +0200
erofs: remove a void EROFS_VERSION macro set in Makefile
Since commit 4f761fa253b4 ("erofs: rename errln/infoln/debugln to
erofs_{err, info, dbg}") the defined macro EROFS_VERSION has no affect,
therefore removing it from the Makefile is a non-functional change.
Link: https://lore.kernel.org/r/20201030122839.25431-1-vladimir@tuxera.com
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Change-Id: Id63ad279985db2a156d62be814bf381c9bea8342
commit d929ef94d4aab35ae96fb6d6efd1a0a23f7d1b48
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Mon Aug 30 11:44:53 2021 +0800
erofs: move from drivers/staging/ to fs/
Since 5.4, erofs has been moved into fs/.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: I95dd967a0097629a9d8eaed1dc11e2cd04f47701
commit 2758a8239cc772c63d5463073b44626ee4e7695a
Author: Gao Xiang <hsiangkao@linux.alibaba.com>
Date: Wed Aug 25 11:42:03 2021 +0800
erofs: sync up with kernel 5.10
Backport 5.10 LTS erofs to 4.19.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Change-Id: Ibf9c0c47e46090b72e75f09a347100f4ff64f28d
commit 1ee3b56216b0d92e2134d6134d2027c842f495b6
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Mon Mar 29 08:36:14 2021 +0800
erofs: add unsupported inode i_format check
commit 24a806d849c0b0c1d0cd6a6b93ba4ae4c0ec9f08 upstream.
If any unknown i_format fields are set (may be of some new incompat
inode features), mark such inode as unsupported.
Just in case of any new incompat i_format fields added in the future.
Link: https://lore.kernel.org/r/20210329003614.6583-1-hsiangkao@aol.com
Fixes: 431339ba9042 ("staging: erofs: add inode operations")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 316472dda45a6a8142fc80800fa92f2846911008
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Thu Jul 30 01:58:01 2020 +0800
erofs: fix extended inode could cross boundary
commit 0dcd3c94e02438f4a571690e26f4ee997524102a upstream.
Each ondisk inode should be aligned with inode slot boundary
(32-byte alignment) because of nid calculation formula, so all
compact inodes (32 byte) cannot across page boundary. However,
extended inode is now 64-byte form, which can across page boundary
in principle if the location is specified on purpose, although
it's hard to be generated by mkfs due to the allocation policy
and rarely used by Android use case now mainly for > 4GiB files.
For now, only two fields `i_ctime_nsec` and `i_nlink' couldn't
be read from disk properly and cause out-of-bound memory read
with random value.
Let's fix now.
Fixes: 431339ba9042 ("staging: erofs: add inode operations")
Cc: <stable@vger.kernel.org> # 4.19+
Link: https://lore.kernel.org/r/20200729175801.GA23973@xiangao.remote.csb
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
[ Gao Xiang: resolve non-trivial conflicts for latest 4.19.y. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee000f1badb6ca558527d2e99e6130e56fe6acfb
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Sun Nov 1 03:51:02 2020 +0800
erofs: derive atime instead of leaving it empty
commit d3938ee23e97bfcac2e0eb6b356875da73d700df upstream.
EROFS has _only one_ ondisk timestamp (ctime is currently
documented and recorded, we might also record mtime instead
with a new compat feature if needed) for each extended inode
since EROFS isn't mainly for archival purposes so no need to
keep all timestamps on disk especially for Android scenarios
due to security concerns. Also, romfs/cramfs don't have their
own on-disk timestamp, and squashfs only records mtime instead.
Let's also derive access time from ondisk timestamp rather than
leaving it empty, and if mtime/atime for each file are really
needed for specific scenarios as well, we can also use xattrs
to record them then.
Link: https://lore.kernel.org/r/20201031195102.21221-1-hsiangkao@aol.com
[ Gao Xiang: It'd be better to backport for user-friendly concern. ]
Fixes: 431339ba9042 ("staging: erofs: add inode operations")
Cc: stable <stable@vger.kernel.org> # 4.19+
Reported-by: nl6720 <nl6720@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[ Gao Xiang: Manually backport to 4.19.y due to trivial conflicts. ]
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0601575a0ca46c49aaf765aaba9df8c1ce63cc9a
Author: Gao Xiang <hsiangkao@redhat.com>
Date: Fri Jun 19 07:43:49 2020 +0800
erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup
commit 3c597282887fd55181578996dca52ce697d985a5 upstream.
Hongyu reported "id != index" in z_erofs_onlinepage_fixup() with
specific aarch64 environment easily, which wasn't shown before.
After digging into that, I found that high 32 bits of page->private
was set to 0xaaaaaaaa rather than 0 (due to z_erofs_onlinepage_init
behavior with specific compiler options). Actually we only use low
32 bits to keep the page information since page->private is only 4
bytes on most 32-bit platforms. However z_erofs_onlinepage_fixup()
uses the upper 32 bits by mistake.
Let's fix it now.
Reported-and-tested-by: Hongyu Jin <hongyu.jin@unisoc.com>
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20200618234349.22553-1-hsiangkao@aol.com
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02cee974cb788dd6b23837c04e347dbadccb7e67
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Feb 26 16:10:06 2020 +0800
erofs: correct the remaining shrink objects
commit 9d5a09c6f3b5fb85af20e3a34827b5d27d152b34 upstream.
The remaining count should not include successful
shrink attempts.
Fixes: e7e9a307be9d ("staging: erofs: introduce workstation for decompression")
Cc: <stable@vger.kernel.org> # 4.19+
Link: https://lore.kernel.org/r/20200226081008.86348-1-gaoxiang25@huawei.com
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afe022d9f5721497e63d11d3fdb06c95c6256c23
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Sun Dec 1 16:01:09 2019 +0800
erofs: zero out when listxattr is called with no xattr
commit 926d1650176448d7684b991fbe1a5b1a8289e97c upstream.
As David reported [1], ENODATA returns when attempting
to modify files by using EROFS as an overlayfs lower layer.
The root cause is that listxattr could return unexpected
-ENODATA by mistake for inodes without xattr. That breaks
listxattr return value convention and it can cause copy
up failure when used with overlayfs.
Resolve by zeroing out if no xattr is found for listxattr.
[1] https://lore.kernel.org/r/CAEvUa7nxnby+rxK-KRMA46=exeOMApkDMAV08AjMkkPnTPV4CQ@mail.gmail.com
Link: https://lore.kernel.org/r/20191201084040.29275-1-hsiangkao@aol.com
Fixes: cadf1ccf1b00 ("staging: erofs: add error handling for xattr submodule")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fceffbd856369cedfa23b313844d3906de8fd36e
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Oct 9 18:12:39 2019 +0800
staging: erofs: detect potential multiref due to corrupted images
commit e12a0ce2fa69798194f3a8628baf6edfbd5c548f upstream.
As reported by erofs-utils fuzzer, currently, multiref
(ondisk deduplication) hasn't been supported for now,
we should forbid it properly.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20190821140152.229648-1-gaoxiang25@huawei.com
[ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED,
let's use EIO instead. ]
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9b3495631f1dba2feac41c880e564df6e242c8ab
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Oct 9 18:12:38 2019 +0800
staging: erofs: add two missing erofs_workgroup_put for corrupted images
commit 138e1a0990e80db486ab9f6c06bd5c01f9a97999 upstream.
As reported by erofs-utils fuzzer, these error handling
path will be entered to handle corrupted images.
Lack of erofs_workgroup_puts will cause unmounting
unsuccessfully.
Fix these return values to EFSCORRUPTED as well.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-4-gaoxiang25@huawei.com
[ Gao Xiang: Older kernel versions don't have length validity check
and EFSCORRUPTED, thus backport pageofs check for now. ]
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20b9eea304f612a2cff8690eebc57d228e45b95e
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Oct 9 18:12:37 2019 +0800
staging: erofs: some compressed cluster should be submitted for corrupted images
commit ee45197c807895e156b2be0abcaebdfc116487c8 upstream.
As reported by erofs_utils fuzzer, a logical page can belong
to at most 2 compressed clusters, if one compressed cluster
is corrupted, but the other has been ready in submitting chain.
The chain needs to submit anyway in order to keep the page
working properly (page unlocked with PG_error set, PG_uptodate
not set).
Let's fix it now.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Link: https://lore.kernel.org/r/20190819103426.87579-2-gaoxiang25@huawei.com
[ Gao Xiang: Manually backport to v4.19.y stable. ]
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c61556faf792f95db0edbee6646fa2f52c8515d1
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Oct 9 18:12:36 2019 +0800
staging: erofs: fix an error handling in erofs_readdir()
commit acb383f1dcb4f1e79b66d4be3a0b6f519a957b0d upstream.
Richard observed a forever loop of erofs_read_raw_page() [1]
which can be generated by forcely setting ->u.i_blkaddr
to 0xdeadbeef (as my understanding block layer can
handle access beyond end of device correctly).
After digging into that, it seems the problem is highly
related with directories and then I found the root cause
is an improper error handling in erofs_readdir().
Let's fix it now.
[1] https://lore.kernel.org/r/1163995781.68824.1566084358245.JavaMail.zimbra@nod.at/
Reported-by: Richard Weinberger <richard@nod.at>
Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Link: https://lore.kernel.org/r/20190818125457.25906-1-hsiangkao@aol.com
[ Gao Xiang: Since earlier kernels don't define EFSCORRUPTED,
let's use original error code instead. ]
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44e25b73c4772f5f08d483bbdcfe81c95758e955
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jun 13 16:35:41 2019 +0800
staging: erofs: add requirements field in superblock
commit 5efe5137f05bbb4688890620934538c005e7d1d6 upstream.
There are some backward incompatible features pending
for months, mainly due to on-disk format expensions.
However, we should ensure that it cannot be mounted with
old kernels. Otherwise, it will causes unexpected behaviors.
Fixes: ba2b77a82022 ("staging: erofs: add super block operations")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c458b3206aa217c67af63679c67cda21d1bb63fd
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Mar 29 04:14:58 2019 +0800
staging: erofs: keep corrupted fs from crashing kernel in erofs_readdir()
commit 33bac912840fe64dbc15556302537dc6a17cac63 upstream.
After commit 419d6efc50e9, kernel cannot be crashed in the namei
path. However, corrupted nameoff can do harm in the process of
readdir for scenerios without dm-verity as well. Fix it now.
Fixes: 3aa8ec716e52 ("staging: erofs: add directory operations")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 77a2c8cadafb7972b2812c097f518ac3099e8a3b
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 25 11:40:07 2019 +0800
staging: erofs: fix error handling when failed to read compresssed data
commit b6391ac73400eff38377a4a7364bd3df5efb5178 upstream.
Complete read error handling paths for all three kinds of
compressed pages:
1) For cache-managed pages, PG_uptodate will be checked since
read_endio will unlock and SetPageUptodate for these pages;
2) For inplaced pages, read_endio cannot SetPageUptodate directly
since it should be used to mark the final decompressed data,
PG_error will be set with page locked for IO error instead;
3) For staging pages, PG_error is used, which is similar to
what we do for inplaced pages.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74528ff6c38df709674cc676f67e79eac815e23f
Author: Chao Yu <yuchao0@huawei.com>
Date: Mon Mar 11 23:10:10 2019 +0800
staging: erofs: fix to handle error path of erofs_vmap()
commit 8bce6dcede65139a087ff240127e3f3c01363eed upstream.
erofs_vmap() wrapped vmap() and vm_map_ram() to return virtual
continuous memory, but both of them can failed due to a lot of
reason, previously, erofs_vmap()'s callers didn't handle them,
which can potentially cause NULL pointer access, fix it.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Fixes: 0d40d6e399c1 ("staging: erofs: add a generic z_erofs VLE decompressor")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 910cd92ee289977f064971f7659cda0228ec1615
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Nov 23 01:16:00 2018 +0800
staging: erofs: fix race when the managed cache is enabled
commit 51232df5e4b268936beccde5248f312a316800be upstream.
When the managed cache is enabled, the last reference count
of a workgroup must be used for its workstation.
Otherwise, it could lead to incorrect (un)freezes in
the reclaim path, and it would be harmful.
A typical race as follows:
Thread 1 (In the reclaim path) Thread 2
workgroup_freeze(grp, 1) refcnt = 1
...
workgroup_unfreeze(grp, 1) refcnt = 1
workgroup_get(grp) refcnt = 2 (x)
workgroup_put(grp) refcnt = 1 (x)
...unexpected behaviors
* grp is detached but still used, which violates cache-managed
freeze constraint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a906ead6ff3295233d3643d662309cddb7efd896
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 11 14:08:58 2019 +0800
staging: erofs: keep corrupted fs from crashing kernel in erofs_namei()
commit 419d6efc50e94bcf5d6b35cd8c71f79edadec564 upstream.
As Al pointed out, "
... and while we are at it, what happens to
unsigned int nameoff = le16_to_cpu(de[mid].nameoff);
unsigned int matched = min(startprfx, endprfx);
struct qstr dname = QSTR_INIT(data + nameoff,
unlikely(mid >= ndirents - 1) ?
maxsize - nameoff :
le16_to_cpu(de[mid + 1].nameoff) - nameoff);
/* string comparison without already matched prefix */
int ret = dirnamecmp(name, &dname, &matched);
if le16_to_cpu(de[...].nameoff) is not monotonically increasing? I.e.
what's to prevent e.g. (unsigned)-1 ending up in dname.len?
Corrupted fs image shouldn't oops the kernel.. "
Revisit the related lookup flow to address the issue.
Fixes: d72d1ce60174 ("staging: erofs: add namei functions")
Cc: <stable@vger.kernel.org> # 4.19+
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6dbf1a15dcd2f0097d819daa4ee1926b2345d02f
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 11 14:08:57 2019 +0800
staging: erofs: fix race of initializing xattrs of a inode at the same time
commit 62dc45979f3f8cb0ea67302a93bff686f0c46c5a upstream.
In real scenario, there could be several threads accessing xattrs
of the same xattr-uninitialized inode, and init_inode_xattrs()
almost at the same time.
That's actually an unexpected behavior, this patch closes the race.
Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 044ba07158562ecf1b2e9079fa97c9980b523eb0
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 11 14:08:56 2019 +0800
staging: erofs: fix memleak of inode's shared xattr array
From: Sheng Yong <shengyong1@huawei.com>
commit 3b1b5291f79d040d549d7c746669fc30e8045b9b upstream.
If it fails to read a shared xattr page, the inode's shared xattr array
is not freed. The next time the inode's xattr is accessed, the previously
allocated array is leaked.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 240517d98c12632095f2848bd94c30debdcaf600
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 11 14:08:55 2019 +0800
staging: erofs: fix fast symlink w/o xattr when fs xattr is on
commit 7077fffcb0b0b65dc75e341306aeef4d0e7f2ec6 upstream.
Currently, this will hit a BUG_ON for these symlinks as follows:
- kernel message
------------[ cut here ]------------
kernel BUG at drivers/staging/erofs/xattr.c:59!
SMP PTI
CPU: 1 PID: 1170 Comm: getllxattr Not tainted 4.20.0-rc6+ #92
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
RIP: 0010:init_inode_xattrs+0x22b/0x270
Code: 48 0f 45 ea f0 ff 4d 34 74 0d 41 83 4c 24 e0 01 31 c0 e9 00 fe ff ff 48 89 ef e8 e0 31 9e ff eb e9 89 e8 e9 ef fd ff ff 0f 0$
<0f> 0b 48 89 ef e8 fb f6 9c ff 48 8b 45 08 a8 01 75 24 f0 ff 4d 34
RSP: 0018:ffffa03ac026bdf8 EFLAGS: 00010246
------------[ cut here ]------------
...
Call Trace:
erofs_listxattr+0x30/0x2c0
? selinux_inode_listxattr+0x5a/0x80
? kmem_cache_alloc+0x33/0x170
? security_inode_listxattr+0x27/0x40
listxattr+0xaf/0xc0
path_listxattr+0x5a/0xa0
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
---[ end trace 3c24b49408dc0c72 ]---
Fix it by checking ->xattr_isize in init_inode_xattrs(),
and it also fixes improper return value -ENOTSUPP
(it should be -ENODATA if xattr is enabled) for those inodes.
Fixes: b17500a0fdba ("staging: erofs: introduce xattr & acl support")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Li Guifu <bluce.liguifu@huawei.com>
Tested-by: Li Guifu <bluce.liguifu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78544513d768a1559d7e61d5e29270844db027d2
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Mar 11 14:08:54 2019 +0800
staging: erofs: add error handling for xattr submodule
commit cadf1ccf1b0021d0b7a9347e102ac5258f9f98c8 upstream.
This patch enhances the missing error handling code for
xattr submodule, which improves the stability for the rare cases.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1f405af62a3f3b37bf965ddbc3ef5aa2fab2f57
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Feb 27 13:33:30 2019 +0800
staging: erofs: compressed_pages should not be accessed again after freed
commit af692e117cb8cd9d3d844d413095775abc1217f9 upstream.
This patch resolves the following page use-after-free issue,
z_erofs_vle_unzip:
...
for (i = 0; i < nr_pages; ++i) {
...
z_erofs_onlinepage_endio(page); (1)
}
for (i = 0; i < clusterpages; ++i) {
page = compressed_pages[i];
if (page->mapping == mngda) (2)
continue;
/* recycle all individual staging pages */
(void)z_erofs_gather_if_stagingpage(page_pool, page); (3)
WRITE_ONCE(compressed_pages[i], NULL);
}
...
After (1) is executed, page is freed and could be then reused, if
compressed_pages is scanned after that, it could fall info (2) or
(3) by mistake and that could finally be in a mess.
This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3a98208a957c0e05850b82ebf7f474ab295ff00
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Feb 27 13:33:31 2019 +0800
staging: erofs: fix illegal address access under memory pressure
commit 1e5ceeab6929585512c63d05911d6657064abf7b upstream.
Considering a read request with two decompressed file pages,
If a decompression work cannot be started on the previous page
due to memory pressure but in-memory LTP map lookup is done,
builder->work should be still NULL.
Moreover, if the current page also belongs to the same map,
it won't try to start the decompression work again and then
run into trouble.
This patch aims to solve the above issue only with little changes
as much as possible in order to make the fix backport easier.
kernel message is:
<4>[1051408.015930s]SLUB: Unable to allocate memory on node -1, gfp=0x2408040(GFP_NOFS|__GFP_ZERO)
<4>[1051408.015930s] cache: erofs_compress, object size: 144, buffer size: 144, default order: 0, min order: 0
<4>[1051408.015930s] node 0: slabs: 98, objs: 2744, free: 0
* Cannot allocate the decompression work
<3>[1051408.015960s]erofs: z_erofs_vle_normalaccess_readpages, readahead error at page 1008 of nid 5391488
* Note that the previous page was failed to read
<0>[1051408.015960s]Internal error: Accessing user space memory outside uaccess.h routines: 96000005 [#1] PREEMPT SMP
...
<4>[1051408.015991s]Hardware name: kirin710 (DT)
...
<4>[1051408.016021s]PC is at z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.016021s]LR is at z_erofs_do_read_page+0x12c/0xcf0
...
<4>[1051408.018096s][<ffffff80c6fb0fd4>] z_erofs_vle_work_add_page+0xa0/0x17c
<4>[1051408.018096s][<ffffff80c6fb3814>] z_erofs_vle_normalaccess_readpages+0x1a0/0x37c
<4>[1051408.018096s][<ffffff80c6d670b8>] read_pages+0x70/0x190
<4>[1051408.018127s][<ffffff80c6d6736c>] __do_page_cache_readahead+0x194/0x1a8
<4>[1051408.018127s][<ffffff80c6d59318>] filemap_fault+0x398/0x684
<4>[1051408.018127s][<ffffff80c6d8a9e0>] __do_fault+0x8c/0x138
<4>[1051408.018127s][<ffffff80c6d8f90c>] handle_pte_fault+0x730/0xb7c
<4>[1051408.018127s][<ffffff80c6d8fe04>] __handle_mm_fault+0xac/0xf4
<4>[1051408.018157s][<ffffff80c6d8fec8>] handle_mm_fault+0x7c/0x118
<4>[1051408.018157s][<ffffff80c8c52998>] do_page_fault+0x354/0x474
<4>[1051408.018157s][<ffffff80c8c52af8>] do_translation_fault+0x40/0x48
<4>[1051408.018157s][<ffffff80c6c002f4>] do_mem_abort+0x80/0x100
<4>[1051408.018310s]---[ end trace 9f4009a3283bd78b ]---
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14b20a49fc73c4818efa3327451904ef6f9c07ab
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Feb 27 13:33:32 2019 +0800
staging: erofs: fix mis-acted TAIL merging behavior
commit a112152f6f3a2a88caa6f414d540bd49e406af60 upstream.
EROFS has an optimized path called TAIL merging, which is designed
to merge multiple reads and the corresponding decompressions into
one if these requests read continuous pages almost at the same time.
In general, it behaves as follows:
________________________________________________________________
... | TAIL . HEAD | PAGE | PAGE | TAIL . HEAD | ...
_____|_combined page A_|________|________|_combined page B_|____
1 ] -> [ 2 ] -> [ 3
If the above three reads are requested in the order 1-2-3, it will
generate a large work chain rather than 3 individual work chains
to reduce scheduling overhead and boost up sequential read.
However, if Read 2 is processed slightly earlier than Read 1,
currently it still generates 2 individual work chains (chain 1, 2)
but it does in-place decompression for combined page A, moreover,
if chain 2 decompresses ahead of chain 1, it will be a race and
lead to corrupted decompressed page. This patch fixes it.
Fixes: 3883a79abd02 ("staging: erofs: introduce VLE decompression support")
Cc: <stable@vger.kernel.org> # 4.19+
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1931a6c5fe28edd9c62d54dd67806c3806e9cdb7
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Dec 11 15:17:50 2018 +0800
staging: erofs: unzip_vle_lz4.c,utils.c: rectify BUG_ONs
commit b8e076a6ef253e763bfdb81e5c72bcc828b0fbeb upstream.
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0773d1966061cba2de6b226947470baf88feda72
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Dec 11 15:17:49 2018 +0800
staging: erofs: unzip_{pagevec.h,vle.c}: rectify BUG_ONs
commit 70b17991d89554cdd16f3e4fb0179bcc03c808d9 upstream.
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6a00c9d7066562e418e30b1b211c77aed5c40551
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Dec 5 21:23:13 2018 +0800
staging: erofs: {dir,inode,super}.c: rectify BUG_ONs
commit 8b987bca2d09649683cbe496419a011df8c08493 upstream.
remove all redundant BUG_ONs, and turn the rest
useful usages to DBG_BUGONs.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef609890e1f8f27546f25d058bcaeb3c5a7a982f
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Nov 23 01:16:03 2018 +0800
staging: erofs: add a full barrier in erofs_workgroup_unfreeze
commit 948bbdb1818b7ad6e539dad4fbd2dd4650793ea9 upstream.
Just like other generic locks, insert a full barrier
in case of memory reorder.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e88d7d9adb52d0f9ba8028c6b4a13e7e83d743a5
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Nov 23 01:16:02 2018 +0800
staging: erofs: fix `erofs_workgroup_{try_to_freeze, unfreeze}'
commit 73f5c66df3e26ab750cefcb9a3e08c71c9f79cad upstream.
There are two minor issues in the current freeze interface:
1) Freeze interfaces have not related with CONFIG_DEBUG_SPINLOCK,
therefore fix the incorrect conditions;
2) For SMP platforms, it should also disable preemption before
doing atomic_cmpxchg in case that some high priority tasks
preempt between atomic_cmpxchg and disable_preempt, then spin
on the locked refcount later.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 26b9413853f64a44d858c24bc2b4c834a2e6a1fc
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Nov 23 01:16:01 2018 +0800
staging: erofs: atomic_cond_read_relaxed on ref-locked workgroup
commit df134b8d17b90c1e7720e318d36416b57424ff7a upstream.
It's better to use atomic_cond_read_relaxed, which is implemented
in hardware instructions to monitor a variable changes currently
for ARM64, instead of open-coded busy waiting.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28e3fa73e294002f8e7c48b6e9ea92784bf9e21a
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Sat Nov 3 17:23:56 2018 +0800
staging: erofs: remove the redundant d_rehash() for the root dentry
commit e9c892465583c8f42d61fafe30970d36580925df upstream.
There is actually no need at all to d_rehash() for the root dentry
as Al pointed out, fix it.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e3e7bbe526acfac4307a2a6d7e2aaf5222ea88de
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Sep 19 13:49:07 2018 +0800
staging: erofs: drop multiref support temporarily
commit e5e3abbadf0dbd1068f64f8abe70401c5a178180 upstream.
Multiref support means that a compressed page could have
more than one reference, which is designed for on-disk data
deduplication. However, mkfs doesn't support this mode
at this moment, and the kernel implementation is also broken.
Let's drop multiref support. If it is fully implemented
in the future, it can be reverted later.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2dd8bd1abced431fa4be477299fa9ddce4677642
Author: Chen Gong <gongchen4@huawei.com>
Date: Tue Sep 18 22:27:28 2018 +0800
staging: erofs: replace BUG_ON with DBG_BUGON in data.c
commit 9141b60cf6a53c99f8a9309bf8e1c6650a6785c1 upstream.
This patch replace BUG_ON with DBG_BUGON in data.c, and add necessary
error handler.
Signed-off-by: Chen Gong <gongchen4@huawei.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a14a5cf712938fadd39fb99a8f8a46d72b19cd4d
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Sep 18 22:27:25 2018 +0800
staging: erofs: complete error handing of z_erofs_do_read_page
commit 1e05ff36e6921ca61bdbf779f81a602863569ee3 upstream.
This patch completes error handing code of z_erofs_do_read_page.
PG_error will be set when some read error happens, therefore
z_erofs_onlinepage_endio will unlock this page without setting
PG_uptodate.
Reviewed-by: Chao Yu <yucxhao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 381d39d1c2d471e4c318320bae60806c5d0b04bd
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Sep 18 22:25:36 2018 +0800
staging: erofs: fix a bug when appling cache strategy
commit 0734ffbf574ee813b20899caef2fe0ed502bb783 upstream.
As described in Kconfig, the last compressed pack should be cached
for further reading for either `EROFS_FS_ZIP_CACHE_UNIPOLAR' or
`EROFS_FS_ZIP_CACHE_BIPOLAR' by design.
However, there is a bug in z_erofs_do_read_page, it will
switch `initial' to `false' at the very beginning before it decides
to cache the last compressed pack.
caching strategy should work properly after appling this patch.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3dc0616d60bcc3888f5dcf4585bcc5e2131a64df
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Fri Nov 23 01:15:59 2018 +0800
staging: erofs: fix the definition of DBG_BUGON
[ Upstream commit eef168789866514e5d4316f030131c9fe65b643f ]
It's better not to positively BUG_ON the kernel, however developers
need a way to locate issues as soon as possible.
DBG_BUGON is introduced and it could only crash when EROFS_FS_DEBUG
(EROFS developping feature) is on. It is helpful for developers
to find and solve bugs quickly by eng builds.
Previously, DBG_BUGON is defined as ((void)0) if EROFS_FS_DEBUG is off,
but some unused variable warnings as follows could occur:
drivers/staging/erofs/unzip_vle.c: In function `init_alway:':
drivers/staging/erofs/unzip_vle.c:61:33: warning: unused variable `work' [-Wunused-variable]
struct z_erofs_vle_work *const work =
^~~~
Fix it to #define DBG_BUGON(x) ((void)(x)).
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 92c97ef11b111b764dc92c5edaf9385f74c72e7d
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Sat Dec 8 00:19:12 2018 +0800
staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
[ Upstream commit 848bd9acdcd00c164b42b14aacec242949ecd471 ]
The root cause is the race as follows:
Thread #0 Thread #1
z_erofs_vle_unzip_kickoff z_erofs_submit_and_unzip
struct z_erofs_vle_unzip_io io[]
atomic_add_return()
wait_event()
[end of function]
wake_up()
Fix it by taking the waitqueue lock between atomic_add_return and
wake_up to close such the race.
kernel message:
Unable to handle kernel paging request at virtual address 97f7052caa1303dc
...
Workqueue: kverityd verity_work
task: ffffffe32bcb8000 task.stack: ffffffe3298a0000
PC is at __wake_up_common+0x48/0xa8
LR is at __wake_up+0x3c/0x58
...
Call trace:
...
[<ffffff94a08ff648>] __wake_up_common+0x48/0xa8
[<ffffff94a08ff8b8>] __wake_up+0x3c/0x58
[<ffffff94a0c11b60>] z_erofs_vle_unzip_kickoff+0x40/0x64
[<ffffff94a0c118e4>] z_erofs_vle_read_endio+0x94/0x134
[<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8
[<ffffff94a1076540>] dec_pending+0x134/0x32c
[<ffffff94a1076f28>] clone_endio+0x90/0xf4
[<ffffff94a0c83c9c>] bio_endio+0xe4/0xf8
[<ffffff94a1095024>] verity_work+0x210/0x368
[<ffffff94a08c4150>] process_one_work+0x188/0x4b4
[<ffffff94a08c45bc>] worker_thread+0x140/0x458
[<ffffff94a08cad48>] kthread+0xec/0x108
[<ffffff94a0883ab4>] ret_from_fork+0x10/0x1c
Code: d1006273 54000260 f9400804 b9400019 (b85fc081)
---[ end trace be9dde154f677cd1 ]---
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 323056dc5fbe4768311194a3a2adf14806f25074
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Sep 18 22:25:33 2018 +0800
staging: erofs: fix a missing endian conversion
[ Upstream commit 37ec35a6cc2b99eb7fd6b85b7d7b75dff46bc353 ]
This patch fixes a missing endian conversion in
vle_get_logical_extent_head.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69f2b4eaba237770f5c696942595d064ae3340f8
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Sep 6 17:01:47 2018 +0800
staging: erofs: rename superblock flags (MS_xyz -> SB_xyz)
This patch follows commit 1751e8a6cb93 ("Rename superblock
flags (MS_xyz -> SB_xyz)") and after commit ("vfs: Suppress
MS_* flag defs within the kernel unless explicitly enabled"),
there is no MS_RDONLY and MS_NOATIME at all.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1621b077d53285bd5127532ce160cec69adbe660
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Tue Aug 28 11:39:48 2018 +0800
Revert "staging: erofs: disable compiling temporarile"
This reverts commit 156c3df8d4db4e693c062978186f44079413d74d.
Since XArray and the new mount apis aren't merged in 4.19-rc1
merge window, the BROKEN mark can be reverted directly without
any problems.
Fixes: 156c3df8d4db ("staging: erofs: disable compiling temporarile")
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3bbdccddb4ee53c0b81545226439f231ae698f65
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Aug 6 11:27:53 2018 +0800
staging: erofs: remove an extra semicolon in z_erofs_vle_unzip_all
There is an extra semicolon in z_erofs_vle_unzip_all, remove it.
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee25ad8cd5b803ae4cda0116304ff15383cb6881
Author: Kristaps Čivkulis <kristaps.civkulis@gmail.com>
Date: Sun Aug 5 18:21:01 2018 +0300
staging: erofs: fix if assignment style issue
Fix coding style issue "do not use assignment in if condition"
detected by checkpatch.pl.
Signed-off-by: Kristaps Čivkulis <kristaps.civkulis@gmail.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 81d71d6e9a330f4471f3edec412d6124031eac46
Author: Chao Yu <yuchao0@huawei.com>
Date: Thu Aug 2 17:39:17 2018 +0800
staging: erofs: disable compiling temporarile
As Stephen Rothwell reported:
"After merging the staging tree, today's linux-next build (x86_64
allmodconfig) failed like this:
drivers/staging/erofs/super.c: In function 'erofs_read_super':
drivers/staging/erofs/super.c:343:17: error: 'MS_RDONLY' undeclared (first use in this function); did you mean 'IS_RDONLY'?
sb->s_flags |= MS_RDONLY | MS_NOATIME;
^~~~~~~~~
IS_RDONLY
drivers/staging/erofs/super.c:343:17: note: each undeclared identifier is reported only once for each function it appears in
drivers/staging/erofs/super.c:343:29: error: 'MS_NOATIME' undeclared (first use in this function); did you mean 'S_NOATIME'?
sb->s_flags |= MS_RDONLY | MS_NOATIME;
^~~~~~~~~~
S_NOATIME
drivers/staging/erofs/super.c: In function 'erofs_mount':
drivers/staging/erofs/super.c:501:10: warning: passing argument 5 of 'mount_bdev' makes integer from pointer without a cast [-Wint-conversion]
&priv, erofs_fill_super);
^~~~~~~~~~~~~~~~
In file included from include/linux/buffer_head.h:12:0,
from drivers/staging/erofs/super.c:14:
include/linux/fs.h:2151:23: note: expected 'size_t {aka long unsigned int}' but argument is of type 'int (*)(struct super_block *, void *, int)'
extern struct dentry *mount_bdev(struct file_system_type *fs_type,
^~~~~~~~~~
drivers/staging/erofs/super.c:500:9: error: too few arguments to function 'mount_bdev'
return mount_bdev(fs_type, flags, dev_name,
^~~~~~~~~~
In file included from include/linux/buffer_head.h:12:0,
from drivers/staging/erofs/super.c:14:
include/linux/fs.h:2151:23: note: declared here
extern struct dentry *mount_bdev(struct file_system_type *fs_type,
^~~~~~~~~~
drivers/staging/erofs/super.c: At top level:
drivers/staging/erofs/super.c:518:20: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.mount = erofs_mount,
^~~~~~~~~~~
drivers/staging/erofs/super.c:518:20: note: (near initialization for 'erofs_fs_type.mount')
drivers/staging/erofs/super.c: In function 'erofs_remount':
drivers/staging/erofs/super.c:630:12: error: 'MS_RDONLY' undeclared (first use in this function); did you mean 'IS_RDONLY'?
*flags |= MS_RDONLY;
^~~~~~~~~
IS_RDONLY
drivers/staging/erofs/super.c: At top level:
drivers/staging/erofs/super.c:640:16: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.remount_fs = erofs_remount,
^~~~~~~~~~~~~
Caused by various commits creating erofs in the staging tree interacting
with various commits redoing the mount infrastructure in the vfs tree.
I have disabed CONFIG_EROFS_FS for now:"
The reason of compiling error is:
Since -next collects and merges developing patches including common vfs
stuff from multi-trees, but those patches didn't cover erofs, such as:
('vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled")
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=109b45090d7d3ce2797bb1ef7f70eead5bfe0ff3
("vfs: Require specification of size of mount data for internal mounts")
https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=for-next&id=0a191e4505a4f255e6513b49426213da69bf0e80
Above vfs related patches has not been merged in staging tree, if we
submit those erofs patches to staging mailing list and after including
them in staging-{test,nexts} tree, it can easily cause compiling error.
We worked out some patches to adjust those vfs change, but now we just
submit them to -next tree temporarily to avoid compiling error.
For potentail conflict in between erofs and vfs changes in incoming
merge window, Stephen suggested that we can disable CONFIG_EROFS_FS
temporarily to pass merge window, and after that we can do restore by
reenabling CONFIG_EROFS_FS and applying those fixing patches. Also
Greg confirmed this solution.
So, let's disable compiling erofs for a while.
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d4499c8b8b78dd00788a7513373d0900013e850
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Aug 1 14:38:31 2018 +0800
staging: erofs: remove a redundant marco in xattr
There is no need to '#if CONFIG_EROFS_FS_XATTR' in xattr.c,
let's remove it.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8eaefd9be86fa3d85305f05192568ba1507dab75
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Wed Aug 1 17:36:54 2018 +0800
staging: erofs: add the missing break in z_erofs_map_blocks_iter
This patch adds a missing break after adding the default case.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b79d82f61f25e532b98f7d3a3d49b250f1728e0d
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Mon Jul 30 09:51:01 2018 +0800
staging: erofs: use the wrapped PTR_ERR_OR_ZERO instead of open code
Just clean up and logic doesn't change.
Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050766.html
Fixes: d72d1ce60174 ("staging: erofs: add namei functions")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef38dd74d8389a7474b0f947185f347e24419686
Author: Gao Xiang <hsiangkao@aol.com>
Date: Sun Jul 29 13:37:57 2018 +0800
staging: erofs: fix conditional uninitialized `pcn' in z_erofs_map_blocks_iter
This patch adds error handling code for
z_erofs_map_blocks_iter to fix the compiler blame.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 294e8e93272dcbdabdd6e033e4d14bbfe6d91bb7
Author: Gao Xiang <hsiangkao@aol.com>
Date: Sun Jul 29 13:34:58 2018 +0800
staging: erofs: fix compile error without built-in decompression support
This patch fixes incorrect code snippets due to spilt code
into small patches by mistake.
Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050747.html
Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050750.html
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db6fedf04cecf5fa3d78a941fd068d581813dfa0
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Sat Jul 28 15:10:32 2018 +0800
staging: erofs: fix a compile warning of Z_EROFS_VLE_VMAP_ONSTACK_PAGES
There is a type mismatch in the definition of
Z_EROFS_VLE_VMAP_ONSTACK_PAGES, let's fix it.
Link: https://lists.01.org/pipermail/kbuild-all/2018-July/050707.html
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73c620c52e51ff2bf93cf02509cdbd9da3d50220
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:08 2018 +0800
staging: erofs: add a TODO and update MAINTAINERS for staging
This patch adds a TODO to list the things to be done, and
the relevant info to MAINTAINERS so we can take all the blame :)
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fd66e0b7e7510165f9c2214a0e68d7025f8b8d83
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:07 2018 +0800
staging: erofs: introduce cached decompression
This patch adds an optional choice which can be
enabled by users in order to cache both incomplete
ends of compressed clusters as a complement to
the in-place decompression in order to boost random
read, but it costs more memory than the in-place
decompression only.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e84127077ff509f7204888244cf848bf9cddd794
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:06 2018 +0800
staging: erofs: introduce VLE decompression support
This patch introduces the basic in-place VLE decompression
implementation for the erofs file system.
Compared with fixed-sized input compression, it implements
what we call 'the variable-length extent compression' which
specifies the same output size for each compression block
to make the full use of IO bandwidth (which means almost
all data from block device can be directly used for decomp-
ression), improve the real (rather than just via data caching,
which costs more memory) random read and keep the relatively
lower compression ratios (it saves more storage space than
fixed-sized input compression which is also configured with
the same input block size), as illustrated below:
|--- variable-length extent ---|------ VLE ------|--- VLE ---|
/> clusterofs /> clusterofs /> clusterofs /> clusterofs
++---|-------++-----------++---------|-++-----------++-|---------++-|
...|| | || || | || || | || | ... original data
++---|-------++-----------++---------|-++-----------++-|---------++-|
++->cluster<-++->cluster<-++->cluster<-++->cluster<-++->cluster<-++
size size size size size
\ / / /
\ / / /
\ / / /
++-----------++-----------++-----------++
... || || || || ... compressed clusters
++-----------++-----------++-----------++
++->cluster<-++->cluster<-++->cluster<-++
size size size
The main point of 'in-place' refers to the decompression mode:
Instead of allocating independent compressed pages and data
structures, it reuses the allocated file cache pages at most
to store its compressed data and the corresponding pagevec in
a time-sharing approach by default, which will be useful for
low memory scenario.
In the end, unlike the other filesystems with (de)compression
support using a relatively large compression block size, which
reads and decompresses >= 128KB at once, and gains a more
good-looking random read (In fact it collects small random reads
into large sequential reads and caches all decompressed data
in memory, but it is unacceptable especially for embedded devices
with limited memory, and it is not the real random read), we
select a universal small-sized 4KB compressed cluster, which is
the smallest page size for most architectures, and all compressed
clusters can be read and decompressed independently, which ensures
random read number for all use cases.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab43173ff3316c0120f9b2c3abc325a18773f30f
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:05 2018 +0800
staging: erofs: introduce workstation for decompression
This patch introduces another concept used by the unzip
subsystem called 'workstation'. It can be seen as a sparse
array that stores pointers pointed to data structures
related to the corresponding physical blocks.
All lookup cases are protected by RCU read lock. Besides,
reference count and spin_lock are also introduced to
manage its lifetime and serialize all update operations.
'workstation' is currently implemented on the in-kernel
radix tree approach for backward compatibility.
With the evolution of linux kernel, it could be migrated
into XArray implementation in the future.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84c882ba349e57fa654b0a52d6529bff5c18c0e0
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:04 2018 +0800
staging: erofs: introduce erofs shrinker
This patch adds a dedicated shrinker targeting to free unneeded
memory consumed by a number of erofs in-memory data structures.
Like F2FS and UBIFS, it also adds:
- sbi->umount_mutex to avoid races on shrinker and put_super
- sbi->shrinker_run_no to not revisit recently scaned objects
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc98494e64df2c56c3d6658f60a86b257e9735a3
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:03 2018 +0800
staging: erofs: introduce superblock registration
In order to introducing shrinker solution for erofs,
let's manage all mounted erofs instances at first.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ded5dd185d595bc3664cabc5de54b84021d3314
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:02 2018 +0800
staging: erofs: add a generic z_erofs VLE decompressor
Currently, this patch only simply implements LZ4
decompressor due to its development priority.
In the future, erofs will support more compression
algorithm and format other than LZ4, thus a generic
decompressor interface will be needed.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6406d5e0a4a3a6e88c6898268c36e421d1c5006b
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:01 2018 +0800
staging: erofs: introduce a customized LZ4 decompression
We have to reduce the memory cost as much as possible,
so we don't want to decompress more data beyond
the output buffer size, however "LZ4_decompress_safe_partial"
doesn't guarantee to stop at the arbitary end position,
but stop just after its current LZ4 "sequence" is completed.
Link: https://groups.google.com/forum/#!topic/lz4c/_3kkz5N6n00
Therefore, I hacked the LZ4 decompression logic by hand,
probably NOT the fastest approach, and hope for better
implementation.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c21aeb7e5feca41005feac999d4cf446dc65a701
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:22:00 2018 +0800
staging: erofs: globalize prepare_bio and __submit_bio
The unzip subsystem also uses these functions,
let's export them to internal.h.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a5908581d539ef37d1d390e3ad647216440c0ace
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:59 2018 +0800
staging: erofs: add erofs_allocpage
This patch introduces an temporary _on-stack_ page
pool to reuse the freed page directly as much as
it can for better performance and release all pages
at a time, it also slightly reduces the possibility of
the potential memory allocation failure.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bbd3e12ab2521a7c982ac0707bf8da7a0d22653b
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:58 2018 +0800
staging: erofs: add erofs_map_blocks_iter
This patch introduces an iterable L2P mapping
operation 'erofs_map_blocks_iter'.
Compared with 'erofs_map_blocks', it avoids
a number of redundant 'release and regrab'
processes if they request the same meta page.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 70622cae335b9140e4358d7084c10cfb3da3301c
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:57 2018 +0800
staging: erofs: introduce pagevec for unzip subsystem
For each compressed cluster, there is a straight-forward
way of allocating a fixed or variable-sized (for VLE) array
to record the corresponding file pages for its decompression
if we decide to decompress these pages asynchronously (eg.
read-ahead case), however it could take much extra on-heap
memory compared with traditional uncompressed filesystems.
This patch introduces a pagevec solution to reuse some
allocated file page in the time-sharing approach storing
parts of the array itself in order to minimize the extra
memory overhead, thus only a constant and small-sized array
used for booting the whole array itself up will be needed.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff29dac3b4b402729a0b75b8724793158701b1f5
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:56 2018 +0800
staging: erofs: <linux/tagptr.h>: introduce tagged pointer
Currently kernel has scattered tagged pointer usages hacked
by hand in plain code, without a unique and portable functionset
to highlight the tagged pointer itself and wrap these hacked code
in order to clean up all over meaningless magic masks.
Therefore, this patch introduces simple generic methods to fold
tags into a pointer integer. It currently supports the last n bits
of the pointer for tags, which can be selected by users.
In addition, it will also be used for the upcoming EROFS filesystem,
which heavily uses tagged pointer approach for high performance
and reducing extra memory allocation.
Link: https://en.wikipedia.org/wiki/Tagged_pointer
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 153f5ad87b67c45b453790ce206ced7c6cc62609
Author: Chao Yu <yuchao0@huawei.com>
Date: Thu Jul 26 20:21:55 2018 +0800
staging: erofs: support tracepoint
Add basic tracepoints for ->readpage{,s}, ->lookup,
->destroy_inode, fill_inode and map_blocks.
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98dd1e3a3f42df26003ae86fd1767b03bef433a6
Author: Chao Yu <yuchao0@huawei.com>
Date: Thu Jul 26 20:21:54 2018 +0800
staging: erofs: introduce error injection infrastructure
This patch introduces error injection infrastructure, with it, we can
inject error in any kernel exported common functions which erofs used,
so that it can force erofs running into error paths, it turns out that
tests can cover real rare paths more easily to find bugs.
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 220c7448cdc4c38e5177777de23793e653969904
Author: Chao Yu <yuchao0@huawei.com>
Date: Thu Jul 26 20:21:53 2018 +0800
staging: erofs: support special inode
This patch adds to support special inode, such as block dev, char,
socket, pipe inode.
Reviewed-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7ed68385c49ac127e5baa07220d37cbf937e89d9
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:52 2018 +0800
staging: erofs: introduce xattr & acl support
This implements xattr and acl functionalities.
Inline and shared xattrs are introduced for flexibility.
Specifically, if the same xattr occurs for many times
in a large number of inodes or the value of a xattr is so large
that it isn't suitable to be inlined, a shared xattr
kept in the xattr meta will be used instead.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit db9bea5cf638b0683376b4118754dad0d444dd7c
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:51 2018 +0800
staging: erofs: update Kconfig and Makefile
This commit adds Makefile and Kconfig for erofs, and
updates Makefile and Kconfig files in the fs directory.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit afad040452afed7d552fb853c8613c6002e17ccb
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:50 2018 +0800
staging: erofs: add namei functions
This commit adds functions that transfer names to inodes.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e7097e1a4a0170e8e51866a1242cc9556dcca5d
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:49 2018 +0800
staging: erofs: add directory operations
This adds functions for directory, mainly readdir.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 421bfd9b50b8051aa451be073f6387bc678cccab
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:48 2018 +0800
staging: erofs: add inode operations
This adds core functions to get, read an inode.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 944a5ab5bd4fc480e4099c8e5e97a6dca490a239
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:47 2018 +0800
staging: erofs: add raw address_space operations
This commit adds functions for meta and raw data, and also
provides address_space_operations for raw data access.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8305bea76c9178ce211e5759061d10effbba958e
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:46 2018 +0800
staging: erofs: add super block operations
This commit adds erofs super block operations, including (u)mount,
remount_fs, show_options, statfs, in addition to some private
icache management functions.
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae2a66470bd70480e4953ff12bc96902e0b59617
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:45 2018 +0800
staging: erofs: add erofs in-memory stuffs
- erofs_sb_info:
contains erofs-specific in-memory information.
- erofs_vnode:
contains vfs_inode and other fs-specific information.
same as super block, the only one in-memory definition exists.
- erofs_map_blocks
plays a role in the file L2P mapping
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dd48cb6b27dc6c5977e59e05c59209ddb68c2f51
Author: Gao Xiang <gaoxiang25@huawei.com>
Date: Thu Jul 26 20:21:44 2018 +0800
staging: erofs: add on-disk layout
This commit adds the on-disk layout header file of erofs.
Note that the on-disk layout is still WIP, and some fields are
reserved for the future use by design.
Any comments are welcome.
Thanks-to: Li Guifu <liguifu2@huawei.com>
Thanks-to: Sun Qiuyang <sunqiuyang@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>