Cherry-pick from origin/upstream-f2fs-stable-linux-4.9.y:
975c5679a2 ("f2fs: don't put dentry page in pagecache into highmem")
Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.
Change-Id: Ia22ed1e5503e6c15d63e4ab3b02a747a47cbc9b1
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've followed up to support some generic features such
as cgroup, block reservation, linking fscrypt_ops, delivering
write_hints, and some ioctls. And, we could fix some corner cases in
terms of power-cut recovery and subtle deadlocks.
Enhancements:
- bitmap operations to handle NAT blocks
- readahead to improve readdir speed
- switch to use fscrypt_*
- apply write hints for direct IO
- add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
- modify b_avail and b_free to consider root reserved blocks
- support cgroup writeback
- support FIEMAP_FLAG_XATTR for fibmap
- add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
- add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
- support inode creation time
Bug fixs:
- sysfile-based quota operations
- memory footprint accounting
- allow to write data on partial preallocation case
- fix deadlock case on fallocate
- fix to handle fill_super errors
- fix missing inode updates of fsync'ed file
- recover renamed file which was fsycn'ed before
- drop inmemory pages in corner error case
- keep last_disk_size correctly
- recover missing i_inline flags during roll-forward
Various clean-up patches were added as well"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.9.y:
71f8f0499e f2fs: support inode creation time
58dc6f6fce f2fs: rebuild sit page from sit info in mem
6393cef3f1 f2fs: stop issuing discard if fs is readonly
742bc90e88 f2fs: clean up duplicated assignment in init_discard_policy
cfabb6edfb f2fs: use GFP_F2FS_ZERO for cleanup
111e8456a6 f2fs: allow to recover node blocks given updated checkpoint
36e041a57c f2fs: recover some i_inline flags
3127a7b67c f2fs: correct removexattr behavior for null valued extended attribute
86f78c1e55 f2fs: drop page cache after fs shutdown
1a3b004759 f2fs: stop gc/discard thread after fs shutdown
62a91a5a48 f2fs: hanlde error case in f2fs_ioc_shutdown
66356ee5f9 f2fs: split need_inplace_update
5912fbae9d f2fs: fix to update last_disk_size correctly
3aa46e2c21 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
acdaca27aa f2fs: clean up error path of fill_super
cf8821115c f2fs: avoid hungtask when GC encrypted block if io_bits is set
4be98c9805 f2fs: allow quota to use reserved blocks
2a6489c87e f2fs: fix to drop all inmem pages correctly
fd21442239 f2fs: speed up defragment on sparse file
6bce96329c f2fs: support F2FS_IOC_PRECACHE_EXTENTS
9ce3d6bb68 f2fs: add an ioctl to disable GC for specific file
9ef5e65684 f2fs: prevent newly created inode from being dirtied incorrectly
08ddb1917e f2fs: support FIEMAP_FLAG_XATTR
aa9c1c1046 f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
92b8f9c726 f2fs: check node page again in write end io
4992a3ca15 f2fs: fix to caclulate required free section correctly
d1a6b4f6c9 f2fs: handle newly created page when revoking inmem pages
462d762b20 f2fs: add resgid and resuid to reserve root blocks
cbd5e5af8c f2fs: implement cgroup writeback support
5a5847421d f2fs: remove unused pend_list_tag
37d4ca7cd1 f2fs: avoid high cpu usage in discard thread
02cfdab834 f2fs: make local functions static
5fee540985 f2fs: add reserved blocks for root user
265974636a f2fs: check segment type in __f2fs_replace_block
4f76d6acc6 f2fs: update inode info to inode page for new file
52b4528174 f2fs: show precise # of blocks that user/root can use
ae0e1fa5a8 f2fs: clean up unneeded declaration
8fc7446629 f2fs: continue to do direct IO if we only preallocate partial blocks
162464df89 f2fs: enable quota at remount from r to w
e270976ff8 f2fs: skip stop_checkpoint for user data writes
d04736926f f2fs: fix missing error number for xattr operation
211cb7bb24 f2fs: recover directory operations by fsync
2648e735ff f2fs: return error during fill_super
e2a0518d8c f2fs: fix an error case of missing update inode page
bf1750bafe f2fs: fix potential hangtask in f2fs_trace_pid
c804fcf3df f2fs: no need return value in restore summary process
fdd41a8793 f2fs: use unlikely for release case
a74690b03e f2fs: don't return value in truncate_data_blocks_range
987892cc67 f2fs: clean up f2fs_map_blocks
d7714cb231 f2fs: clean up hash codes
e3d2a1e946 f2fs: fix error handling in fill_super
b02e72d294 f2fs: spread f2fs_k{m,z}alloc
ead5259de3 f2fs: inject fault to kvmalloc
e585ca29dd f2fs: inject fault to kzalloc
8234ed56e7 f2fs: remove a redundant conditional expression
1a9d6a9c00 f2fs: apply write hints to select the type of segment for direct write
955e7f58f6 f2fs: switch to fscrypt_prepare_setattr()
268c7f607c f2fs: switch to fscrypt_prepare_lookup()
8dfa646f97 f2fs: switch to fscrypt_prepare_rename()
d5382ccb02 f2fs: switch to fscrypt_prepare_link()
3ccc177c9b f2fs: switch to fscrypt_file_open()
8b5674efdc f2fs: remove repeated f2fs_bug_on
ba4556cdf1 f2fs: remove an excess variable
46accc9251 f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
8933908c4f f2fs: remove unused parameter
76b6e8ed20 f2fs: still write data if preallocate only partial blocks
1ed753392f f2fs: introduce sysfs readdir_ra to readahead inode block in readdir
4e68a15eee f2fs: fix concurrent problem for updating free bitmap
9be6e75962 f2fs: remove unneeded memory footprint accounting
923df752db f2fs: no need to read nat block if nat_block_bitmap is set
09234be262 f2fs: reserve nid resource for quota sysfile
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we introduce sysfile-based quota support which is
required for Android by default. In addition, we allow that users are
able to reserve some blocks in runtime to mitigate performance drops
in low free space.
Enhancements:
- assign proper data segments according to write_hints given by user
- issue cache_flush on dirty devices only among multiple devices
- exploit cp_error flag and add more faults to enhance fault
injection test
- conduct more readaheads during f2fs_readdir
- add a range for discard commands
Bug fixes:
- fix zero stat->st_blocks when inline_data is set
- drop crypto key and free stale memory pointer while evict_inode is
failing
- fix some corner cases in free space and segment management
- fix wrong last_disk_size
This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"
Cherry-picked from origin/upstream-f2fs-stable-linux-4.9.y:
5b2b7f7dd8 f2fs: deny accessing encryption policy if encryption is off
05dac2e898 f2fs: inject fault in inc_valid_node_count
2e08de4fda f2fs: fix to clear FI_NO_PREALLOC
931ecc22b4 f2fs: expose quota information in debugfs
45d6e702d3 f2fs: separate nat entry mem alloc from nat_tree_lock
8e2f721703 f2fs: validate before set/clear free nat bitmap
27d50282d0 f2fs: avoid opened loop codes in __add_ino_entry
b1823df0e6 f2fs: apply write hints to select the type of segments for buffered write
b561061c06 f2fs: introduce scan_curseg_cache for cleanup
5772e0c102 f2fs: optimize the way of traversing free_nid_bitmap
a51e85eae2 f2fs: keep scanning until enough free nids are acquired
d75eb8d734 f2fs: trace checkpoint reason in fsync()
bed6cffdf7 f2fs: keep isize once block is reserved cross EOF
5f3fdd2afc f2fs: avoid race in between GC and block exchange
51cb399e7e f2fs: save a multiplication for last_nid calculation
7f41aab3d6 f2fs: fix summary info corruption
148c518517 f2fs: remove dead code in update_meta_page
c3bc6e5183 f2fs: remove unneeded semicolon
9e71a0321f f2fs: don't bother with inode->i_version
49f72728e7 f2fs: check curseg space before foreground GC
25d0becffa f2fs: use rw_semaphore to protect SIT cache
0108c481d7 f2fs: support quota sys files
d4c292db7b f2fs: add quota_ino feature infra
1033eee92c f2fs: optimize __update_nat_bits
247e895116 f2fs: modify for accurate fggc node io stat
c7272f8aeb Revert "f2fs: handle dirty segments inside refresh_sit_entry"
068868fc7e f2fs: add a function to move nid
b9f73875af f2fs: export SSR allocation threshold
ab30204bb9 f2fs: give correct trimmed blocks in fstrim
b5db2de462 f2fs: support bio allocation error injection
58ddec85e4 f2fs: support get_page error injection
ef216e610a f2fs: add missing sysfs description
68ab6f8dd5 f2fs: support soft block reservation
d7947e2a31 f2fs: handle error case when adding xattr entry
50ffaa980f f2fs: support flexible inline xattr size
5a8ed073c7 f2fs: show current cp state
d888fcd74c f2fs: add missing quota_initialize
af1cc1ea23 f2fs: show # of dirty segments via sysfs
6663422a36 f2fs: stop all the operations by cp_error flag
872d8e3af0 f2fs: remove several redundant assignments
bf823c82e3 f2fs: avoid using timespec
c70ab1b993 f2fs: fix to correct no_fggc_candidate
0e6275dc31 Revert "f2fs: return wrong error number on f2fs_quota_write"
41d59230e3 f2fs: remove obsolete pointer for truncate_xattr_node
8c12a10f2e f2fs: retry ENOMEM for quota_read|write
35e13ca2e9 f2fs: limit # of inmemory pages
9ca57a7e96 f2fs: update ctx->pos correctly when hitting hole in directory
a04208e54b f2fs: relocate readahead codes in readdir()
905d0370e6 f2fs: allow readdir() to be interrupted
2dfbda03f9 f2fs: trace f2fs_readdir
d67586ddf3 f2fs: trace f2fs_lookup
4c94f14b3c f2fs: skip searching non-exist range in truncate_hole
ac5d4b4257 f2fs: expose some sectors to user in inline data or dentry case
5ded3b82dc f2fs: avoid stale fi->gdirty_list pointer
f6b708e25f f2fs/crypto: drop crypto key at evict_inode only
33fdebbb0e f2fs: fix to avoid race when accessing last_disk_size
595046758d f2fs: Fix bool initialization/comparison
1e5305afa8 f2fs: give up CP_TRIMMED_FLAG if it drops discards
8258fd3054 f2fs: trace f2fs_remove_discard
6c46b37d9b f2fs: reduce cmd_lock coverage in __issue_discard_cmd
daf437d37c f2fs: split discard policy
69a596797a f2fs: wrap discard policy
28e1023e8e f2fs: support issuing/waiting discard in range
fd6422ea92 f2fs: fix to flush multiple device in checkpoint
f014be822c f2fs: enhance multiple device flush
0597a6e4bd f2fs: fix to show ino management cache size correctly
cacc1ed0c4 f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
84af6aeceb f2fs: obsolete ALLOC_NID_LIST list
8456d34378 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
3f01af786c f2fs: allow readpages with NULL file pointer
2f0df25e65 f2fs: show flush list status in sysfs
20ef20fbf7 f2fs: introduce read_xattr_block
126221de37 f2fs: introduce read_inline_xattr
127faa71f6 Revert "f2fs: reuse nids more aggressively"
c19928e660 Revert "f2fs: node segment is prior to data segment selected victim"
Change-Id: I2f892e6ee75c41e84241f37b1903e0c32387d95b
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Cherry-picked from upstream-f2fs-stable-linux-4.9.y
Changes include:
commit 30da3a4de9 ("f2fs: hurry up to issue discard after io interruption")
commit d1c363b483 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit e6b120d4d0 ("f2fs/fscrypt: catch up to v4.12")
commit 4d7931d727 ("KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()")
Signed-off-by: Hyojun Kim <hyojun@google.com>
commit b9dd46188edc2f0d1f37328637860bb65a771124 upstream.
F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.
Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.
There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below serials, we will lose parts of dirents:
1) mount f2fs with inline_dentry option
2) echo 1 > /sys/fs/f2fs/sdX/dir_level
3) mkdir dir
4) touch 180 files named [1-180] in dir
5) touch 181 in dir
6) echo 3 > /proc/sys/vm/drop_caches
7) ll dir
ls: cannot access 2: No such file or directory
ls: cannot access 4: No such file or directory
ls: cannot access 5: No such file or directory
ls: cannot access 6: No such file or directory
ls: cannot access 8: No such file or directory
ls: cannot access 9: No such file or directory
...
total 360
drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
-rw-r--r-- 1 root root 0 Feb 19 15:12 1
-rw-r--r-- 1 root root 0 Feb 19 15:12 10
-rw-r--r-- 1 root root 0 Feb 19 15:12 100
-????????? ? ? ? ? ? 101
-????????? ? ? ? ? ? 102
-????????? ? ? ? ? ? 103
...
The reason is: when doing the inline dir conversion, we didn't consider
that directory has hierarchical hash structure which can be configured
through sysfs interface 'dir_level'.
By default, dir_level of directory inode is 0, it means we have one bucket
in hash table located in first level, all dirents will be hashed in this
bucket, so it has no problem for us to do the duplication simply between
inline dentry page and converted normal dentry page.
However, if we configured dir_level with the value N (greater than 0), it
will expand the bucket number of first level hash table by 2^N - 1, it
hashs dirents into different buckets according their hash value, if we
still move all dirents to first bucket, it makes incorrent locating for
inline dirents, the result is, although we can iterate all dirents through
->readdir, we can't stat some of them in ->lookup which based on hash
table searching.
This patch fixes this issue by rehashing dirents into correct position
when converting inline directory.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch preallocates data blocks for buffered aio writes.
With this patch, we can avoid redundant locking and unlocking of node pages
given consecutive aio request.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are redundant pointer conversion in following call stack:
- at position a, inode was been converted to f2fs_file_info.
- at position b, f2fs_file_info was been converted to inode again.
- truncate_blocks(inode,..)
- fi = F2FS_I(inode) ---a
- ADDRS_PER_PAGE(node_page, fi)
- addrs_per_inode(fi)
- inode = &fi->vfs_inode ---b
- f2fs_has_inline_xattr(inode)
- fi = F2FS_I(inode)
- is_inode_flag_set(fi,..)
In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add annotation to let us know more clearly about space utilization
information of regular dentry and inline dentry.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.
The underlying precedure is:
1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.
2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.
3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info}
as annotation of this field descripts its meaning well to us.
By this way, we can avoid long statement in code of following patches.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds FASTBOOT flag into checkpoint as follows.
- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.
So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.
In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.
change log from v1:
o fix typo pointed out by Jaegeuk Kim.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.
This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs's cp has one page which consists of struct f2fs_checkpoint and
version bitmap of sit and nat. To support lots of segments, we need more
blocks for sit bitmap. So let's arrange sit bitmap as following:
+-----------------+------------+
| f2fs_checkpoint | sit bitmap |
| + nat bitmap | |
+-----------------+------------+
0 4k N blocks
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: simple code change for readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.
Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.
Changes from V1
o modify description of calculation in f2fs.txt suggested by Changman Lee.
Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch introduces an i_dir_level field to support large directory.
Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.
In Documentation/filesystems/f2fs.txt,
----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------
level #0 | A(2B)
|
level #1 | A(2B) - A(2B)
|
level #2 | A(2B) - A(2B) - A(2B) - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.
In order to avoid that, we can configure the hash tables sparsely from level #0
like this.
level #0 | A(2B) - A(2B) - A(2B) - A(2B)
level #1 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
. | . . . .
level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
With this structure, we can skip the ineffective tree searches in lower level
hash tables.
This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds a inline_data recovery routine with the following policy.
[prev.] [next] of inline_data flag
o o -> recover inline_data
o x -> remove inline_data, and then recover data blocks
x o -> remove inline_data, and then recover inline_data
x x -> recover data blocks
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Add new inode flags F2FS_INLINE_DATA and FI_INLINE_DATA to indicate
whether the inode has inline data.
Inline data makes use of inode block's data indices region to save small
file. Currently there are 923 data indices in an inode block. Since
inline xattr has made use of the last 50 indices to save its data, there
are 873 indices left which can be used for inline data. When
FI_INLINE_DATA is set, the layout of inode block's indices region is
like below:
+-----------------+
| | Reserved. reserve_new_block() will make use of
| i_addr[0] | i_addr[0] when we need to reserve a new data block
| | to convert inline data into regular one's.
|-----------------|
| | Used by inline data. A file whose size is less than
| i_addr[1~872] | 3488 bytes(~3.4k) and doesn't reserve extra
| | blocks by fallocate() can be saved here.
|-----------------|
| |
| i_addr[873~922] | Reserved for inline xattr
| |
+-----------------+
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Weihong Xu <weihong.xu@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
0. modified inode structure
--------------------------------------
metadata (e.g., i_mtime, i_ctime, etc)
--------------------------------------
direct pointers [0 ~ 873]
inline xattrs (200 bytes by default)
indirect pointers [0 ~ 4]
--------------------------------------
node footer
--------------------------------------
1. setxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- handle xattr entries
- write_all_xattrs copies modified xattrs into inline and xattr node block.
2. getxattr flow
- read_all_xattrs copies all the xattrs from inline and xattr node block.
- check target entries
3. Usage
# mount -t f2fs -o inline_xattr $DEV $MNT
Once mounted with the inline_xattr option, f2fs marks all the newly created
files to reserve an amount of inline xattr space explicitly inside the inode
block. Without the mount option, f2fs will not touch any existing files and
newly created files as well.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.
The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.
The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
and add a mount option, inline_xattr, which is enabled when xattr is set.
If the mount option is enabled, all the files are marked with the inline_xattrs
flag.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The on-disk block address is defined as __le32, but in-memory block address,
block_t, does as u64.
Let's synchronize them to 32 bits.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Simplify code by providing the accessor macro to retrieve the
number of dentry slots for a given filename length.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
This patch should resolve the bugs reported by the sparse tool.
Initial reports were written by "kbuild test robot" managed by fengguang.wu.
In my local machines, I've tested also by running:
> make C=2 CF="-D__CHECK_ENDIAN__"
Accordingly, I've found lots of warnings and bugs related to the endian
conversion. And I've fixed all at this moment.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>