# By Chao Yu (415) and others # Via Greg Kroah-Hartman (53) and others * google/common/deprecated/android-4.4: ANDROID: regression introduced override_creds=off ANDROID: sched: Disallow WALT with CFS bandwidth control ANDROID: fiq_debugger: remove ANDROID: Add a tracepoint for mapping inode to full path ANDROID: fix binder change in merge of 4.4.183 UPSTREAM: net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier f2fs: use EINVAL for superblock with invalid magic f2fs: fix to read source block before invalidating it f2fs: remove redundant check from f2fs_setflags_common() f2fs: use generic checking and prep function for FS_IOC_SETFLAGS ANDROID: overlayfs: Fix a regression in commitb24be4acdANDROID: xfrm: remove in_compat_syscall() checks ANDROID: enable CONFIG_RTC_DRV_TEST on cuttlefish BACKPORT: binder: Set end of SG buffer area properly. ANDROID: overlayfs ovl_create_of_link regression f2fs: improve print log in f2fs_sanity_check_ckpt() f2fs: avoid out-of-range memory access f2fs: fix to avoid long latency during umount f2fs: allow all the users to pin a file f2fs: support swap file w/ DIO f2fs: allocate blocks for pinned file f2fs: fix is_idle() check for discard type f2fs: add a rw_sem to cover quota flag changes f2fs: set SBI_NEED_FSCK for xattr corruption case f2fs: use generic EFSBADCRC/EFSCORRUPTED f2fs: Use DIV_ROUND_UP() instead of open-coding f2fs: print kernel message if filesystem is inconsistent f2fs: introduce f2fs_<level> macros to wrap f2fs_printk() f2fs: avoid get_valid_blocks() for cleanup f2fs: ioctl for removing a range from F2FS f2fs: only set project inherit bit for directory f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags ANDROID: Fixes to locking around handle_lmk_event ANDROID: Avoid taking multiple locks in handle_lmk_event ANDROID: kernel: cgroup: cpuset: Clear cpus_requested for empty buf ANDROID: kernel: cgroup: cpuset: Add missing allocation of cpus_requested in alloc_trial_cpuset f2fs: Add option to limit required GC for checkpoint=disable f2fs: Fix accounting for unusable blocks f2fs: Fix root reserved on remount f2fs: Lower threshold for disable_cp_again f2fs: fix sparse warning f2fs: fix f2fs_show_options to show nodiscard mount option f2fs: add error prints for debugging mount failure f2fs: fix to do sanity check on segment bitmap of LFS curseg f2fs: add missing sysfs entries in documentation f2fs: fix to avoid deadloop if data_flush is on f2fs: always assume that the device is idle under gc_urgent f2fs: add bio cache for IPU f2fs: allow ssr block allocation during checkpoint=disable period f2fs: fix to check layout on last valid checkpoint park UPSTREAM: binder: check for overflow when alloc for security context BACKPORT: binder: fix race between munmap() and direct reclaim f2fs: link f2fs quota ops for sysfile fs: sdcardfs: Add missing option to show_options ANDROID: cuttlefish_defconfig: Disable DEVTMPFS ANDROID: Move from clang r349610 to r353983c. f2fs: fix to avoid accessing xattr across the boundary f2fs: fix to avoid potential race on sbi->unusable_block_count access/update f2fs: add tracepoint for f2fs_filemap_fault() f2fs: introduce DATA_GENERIC_ENHANCE f2fs: fix to handle error in f2fs_disable_checkpoint() f2fs: remove redundant check in f2fs_file_write_iter() f2fs: fix to be aware of readonly device in write_checkpoint() f2fs: fix to skip recovery on readonly device f2fs: fix to consider multiple device for readonly check f2fs: relocate chksum_offset for large_nat_bitmap feature f2fs: allow unfixed f2fs_checkpoint.checksum_offset f2fs: Replace spaces with tab f2fs: insert space before the open parenthesis '(' f2fs: allow address pointer number of dnode aligning to specified size f2fs: introduce f2fs_read_single_page() for cleanup f2fs: mark is_extension_exist() inline f2fs: fix to set FI_UPDATE_WRITE correctly f2fs: fix to avoid panic in f2fs_inplace_write_data() f2fs: fix to do sanity check on valid block count of segment f2fs: fix to do sanity check on valid node/block count f2fs: fix to avoid panic in do_recover_data() f2fs: fix to do sanity check on free nid f2fs: fix to do checksum even if inode page is uptodate f2fs: fix to avoid panic in f2fs_remove_inode_page() f2fs: fix to clear dirty inode in error path of f2fs_iget() f2fs: remove new blank line of f2fs kernel message f2fs: fix wrong __is_meta_io() macro f2fs: fix to avoid panic in dec_valid_node_count() f2fs: fix to avoid panic in dec_valid_block_count() f2fs: fix to use inline space only if inline_xattr is enable f2fs: fix to retrieve inline xattr space f2fs: fix error path of recovery f2fs: fix to avoid deadloop in foreground GC f2fs: data: fix warning Using plain integer as NULL pointer f2fs: add tracepoint for f2fs_file_write_iter() f2fs: add comment for conditional compilation statement f2fs: fix potential recursive call when enabling data_flush f2fs: improve discard handling with multi-device volumes f2fs: Reduce zoned block device memory usage f2fs: Fix use of number of devices ANDROID: Communicates LMK events to userland where they can be logged Make arm64 serial port config compatible with crosvm Fix merge issue with 4.4.178 Fix merge issue with 4.4.177 ANDROID: cuttlefish_defconfig: Enable CONFIG_OVERLAY_FS ANDROID: drop CONFIG_INPUT_KEYCHORD from cuttlefish and ranchu UPSTREAM: virt_wifi: Remove REGULATORY_WIPHY_SELF_MANAGED UPSTREAM: net: socket: set sock->sk to NULL after calling proto_ops::release() Revert "ANDROID: arm: process: Add display of memory around registers when displaying regs." f2fs: set pin_file under CAP_SYS_ADMIN f2fs: fix to avoid deadlock in f2fs_read_inline_dir() f2fs: fix to adapt small inline xattr space in __find_inline_xattr() f2fs: fix to do sanity check with inode.i_inline_xattr_size f2fs: give some messages for inline_xattr_size f2fs: don't trigger read IO for beyond EOF page f2fs: fix to add refcount once page is tagged PG_private f2fs: remove wrong comment in f2fs_invalidate_page() f2fs: fix to use kvfree instead of kzfree f2fs: print more parameters in trace_f2fs_map_blocks f2fs: trace f2fs_ioc_shutdown f2fs: fix to avoid deadlock of atomic file operations f2fs: fix to dirty inode for i_mode recovery f2fs: give random value to i_generation f2fs: no need to take page lock in readdir f2fs: fix to update iostat correctly in IPU path f2fs: fix encrypted page memory leak f2fs: make fault injection covering __submit_flush_wait() f2fs: fix to retry fill_super only if recovery failed f2fs: silence VM_WARN_ON_ONCE in mempool_alloc f2fs: correct spelling mistake f2fs: fix wrong #endif f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG f2fs: don't allow negative ->write_io_size_bits f2fs: fix to check inline_xattr_size boundary correctly ANDROID: mnt: Propagate remount correctly ANDROID: cuttlefish_defconfig: Add support for AC97 audio ANDROID: overlayfs: override_creds=off option bypass creator_cred FROMGIT: binder: create node flag to request sender's security context Revert "f2fs: fix to avoid deadlock of atomic file operations" Revert "f2fs: fix to check inline_xattr_size boundary correctly" BACKPORT: userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas ANDROID: cuttlefish_defconfig: Enable DEBUG_SET_MODULE_RONX ANDROID: Move from clang r346389b to r349610. UPSTREAM: virt_wifi: fix error return code in virt_wifi_newlink() ion: Disable ION_HEAP_TYPE_SYSTEM_CONTIG UPSTREAM: binder: filter out nodes when showing binder procs f2fs: do not use mutex lock in atomic context f2fs: fix potential data inconsistence of checkpoint f2fs: fix to avoid deadlock of atomic file operations f2fs: fix to check inline_xattr_size boundary correctly f2fs: jump to label 'free_node_inode' when failing from d_make_root() f2fs: fix to document inline_xattr_size option f2fs: fix to data block override node segment by mistake f2fs: fix typos in code comments f2fs: sync filesystem after roll-forward recovery fs: export evict_inodes f2fs: flush quota blocks after turnning it off f2fs: avoid null pointer exception in dcc_info f2fs: don't wake up too frequently, if there is lots of IOs f2fs: try to keep CP_TRIMMED_FLAG after successful umount f2fs: add quick mode of checkpoint=disable for QA f2fs: run discard jobs when put_super f2fs: fix to set sbi dirty correctly f2fs: UBSAN: set boolean value iostat_enable correctly f2fs: add brackets for macros f2fs: check if file namelen exceeds max value f2fs: fix to trigger fsck if dirent.name_len is zero f2fs: no need to check return value of debugfs_create functions f2fs: export FS_NOCOW_FL flag to user f2fs: check inject_rate validity during configuring f2fs: remove set but not used variable 'err' f2fs: fix compile warnings: 'struct *' declared inside parameter list f2fs: change error code to -ENOMEM from -EINVAL ANDROID: cuttlefish_defconfig: Enable CONFIG_RTC_HCTOSYS UPSTREAM: dm: do not allow readahead to limit IO size UPSTREAM: readahead: stricter check for bdi io_pages UPSTREAM: mm: don't cap request size based on read-ahead setting ANDROID: Fix cuttlefish redundant vsock connection. UPSTREAM: loop: drop caches if offset or block_size are changed UPSTREAM: virtio: new feature to detect IOMMU device quirk UPSTREAM: vring: Use the DMA API on Xen UPSTREAM: virtio_ring: Support DMA APIs UPSTREAM: vring: Introduce vring_use_dma_api() ANDROID: cuttlefish_defconfig: Enable vsock options UPSTREAM: vhost/vsock: fix reset orphans race with close timeout UPSTREAM: vhost/vsock: fix use-after-free in network stack callers UPSTREAM: vhost: correctly check the iova range when waking virtqueue UPSTREAM: vhost: synchronize IOTLB message with dev cleanup UPSTREAM: vhost: fix info leak due to uninitialized memory UPSTREAM: vhost: fix vhost_vq_access_ok() log check UPSTREAM: vhost: validate log when IOTLB is enabled UPSTREAM: vhost_net: add missing lock nesting notation UPSTREAM: vhost: use mutex_lock_nested() in vhost_dev_lock_vqs() UPSTREAM: vhost/vsock: fix uninitialized vhost_vsock->guest_cid UPSTREAM: vhost_net: correctly check tx avail during rx busy polling UPSTREAM: vsock: use new wait API for vsock_stream_sendmsg() UPSTREAM: vsock: cancel packets when failing to connect UPSTREAM: vhost-vsock: add pkt cancel capability UPSTREAM: vsock: track pkt owner vsock UPSTREAM: vhost: fix initialization for vq->is_le UPSTREAM: vhost/vsock: handle vhost_vq_init_access() error UPSTREAM: vsock: lookup and setup guest_cid inside vhost_vsock_lock UPSTREAM: vhost-vsock: fix orphan connection reset UPSTREAM: vsock/virtio: fix src/dst cid format UPSTREAM: VSOCK: Don't dec ack backlog twice for rejected connections UPSTREAM: vhost/vsock: drop space available check for TX vq UPSTREAM: virtio-vsock: fix include guard typo UPSTREAM: vhost/vsock: fix vhost virtio_vsock_pkt use-after-free UPSTREAM: VSOCK: Use kvfree() BACKPORT: vhost: split out vringh Kconfig UPSTREAM: vhost: drop vringh dependency UPSTREAM: vhost: drop vringh dependency UPSTREAM: vhost: detect 32 bit integer wrap around UPSTREAM: VSOCK: Add Makefile and Kconfig UPSTREAM: VSOCK: Introduce vhost_vsock.ko UPSTREAM: VSOCK: Introduce virtio_transport.ko BACKPORT: VSOCK: Introduce virtio_vsock_common.ko UPSTREAM: VSOCK: defer sock removal to transports UPSTREAM: VSOCK: transport-specific vsock_transport functions UPSTREAM: vsock: make listener child lock ordering explicit UPSTREAM: vhost: new device IOTLB API BACKPORT: vhost: convert pre sorted vhost memory array to interval tree UPSTREAM: vhost: introduce vhost memory accessors UPSTREAM: vhost_net: stop polling socket during rx processing UPSTREAM: VSOCK: constify vsock_transport structure UPSTREAM: vhost: lockless enqueuing UPSTREAM: vhost: simplify work flushing UPSTREAM: VSOCK: Only check error on skb_recv_datagram when skb is NULL BACKPORT: AF_VSOCK: Shrink the area influenced by prepare_to_wait UPSTREAM: vhost_net: basic polling support UPSTREAM: vhost: introduce vhost_vq_avail_empty() UPSTREAM: vhost: introduce vhost_has_work() UPSTREAM: vhost: rename vhost_init_used() UPSTREAM: vhost: rename cross-endian helpers UPSTREAM: vhost: fix error path in vhost_init_used() UPSTREAM: virtio: make find_vqs() checkpatch.pl-friendly UPSTREAM: net: move napi_hash[] into read mostly section ANDROID: cuttlefish_defconfig: remove DM_VERITY_HASH_PREFETCH_MIN_SIZE Revert "ANDROID: dm verity: add minimum prefetch size" ANDROID: f2fs: Complement "android_fs" tracepoint of read path f2fs: don't access node/meta inode mapping after iput f2fs: wait on atomic writes to count F2FS_CP_WB_DATA f2fs: sanity check of xattr entry size f2fs: fix use-after-free issue when accessing sbi->stat_info f2fs: check PageWriteback flag for ordered case f2fs: fix validation of the block count in sanity_check_raw_super f2fs: fix missing unlock(sbi->gc_mutex) f2fs: clean up structure extent_node f2fs: fix block address for __check_sit_bitmap f2fs: fix sbi->extent_list corruption issue f2fs: clean up checkpoint flow f2fs: flush stale issued discard candidates f2fs: correct wrong spelling, issing_* f2fs: use kvmalloc, if kmalloc is failed f2fs: remove redundant comment of unused wio_mutex f2fs: fix to reorder set_page_dirty and wait_on_page_writeback f2fs: clear PG_writeback if IPU failed f2fs: add an ioctl() to explicitly trigger fsck later f2fs: avoid frequent costly fsck triggers f2fs: fix m_may_create to make OPU DIO write correctly f2fs: fix to update new block address correctly for OPU f2fs: adjust trace print in f2fs_get_victim() to cover all paths f2fs: fix to allow node segment for GC by ioctl path f2fs: make "f2fs_fault_name[]" const char * f2fs: read page index before freeing f2fs: fix wrong return value of f2fs_acl_create f2fs: avoid build warn of fall_through f2fs: fix race between write_checkpoint and write_begin f2fs: check memory boundary by insane namelen f2fs: only flush the single temp bio cache which owns the target page f2fs: fix out-place-update DIO write f2fs: fix to be aware discard/preflush/dio command in is_idle() f2fs: add to account direct IO f2fs: move dir data flush to write checkpoint process f2fs: change segment to section in f2fs_ioc_gc_range f2fs: export migration_granularity sysfs entry f2fs: support subsectional garbage collection f2fs: introduce __is_large_section() for cleanup f2fs: clean up f2fs_sb_has_##feature_name f2fs: remove codes of unused wio_mutex f2fs: fix count of seg_freed to make sec_freed correct f2fs: fix to account preflush command for noflush_merge mode f2fs: avoid GC causing encrypted file corrupted ANDROID: cuttlefish_defconfig: Enable VIRTIO_INPUT ANDROID: Revert fs/squashfs back to linux-4.4.y ANDROID: uid_sys_stats: Copy task_struct comm field to bigger buffer ANDROID: cuttlefish_defconfig: Enable VIRT_WIFI FROMGIT, BACKPORT: mac80211-next: rtnetlink wifi simulation device ANDROID: Move from clang r328903 to r346389b. UPSTREAM: binder: fix race that allows malicious free of live buffer ANDROID: arm64 defconfig / build config for cuttlefish ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple Revert "ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple" ANDROID: sdcardfs: Add option to not link obb ANDROID: sdcardfs: Add sandbox UPSTREAM: seccomp: Fix tracer exit notifications during fatal signals UPSTREAM: arm64/ptrace: run seccomp after ptrace UPSTREAM: arm/ptrace: run seccomp after ptrace BACKPORT: x86/ptrace: run seccomp after ptrace UPSTREAM: seccomp: recheck the syscall after RET_TRACE UPSTREAM: seccomp: remove 2-phase API BACKPORT: x86/entry: Get rid of two-phase syscall entry work BACKPORT: seccomp: Add a seccomp_data parameter secure_computing() BACKPORT: x86/entry/64: Always run ptregs-using syscalls on the slow path UPSTREAM: x86/syscalls: Add syscall entry qualifiers UPSTREAM: x86/syscalls: Move compat syscall entry handling into syscalltbl.sh UPSTREAM: x86/syscalls: Remove __SYSCALL_COMMON and __SYSCALL_X32 UPSTREAM: x86/syscalls: Refactor syscalltbl.sh Makefile: Tidy up 4.4.165 merge ANDROID: zram: set comp_len to PAGE_SIZE when page is huge BACKPORT: xfrm: Allow Output Mark to be Updated Using UPDSA ANDROID: sdcardfs: Add option to drop unused dentries f2fs: guarantee journalled quota data by checkpoint f2fs: cleanup dirty pages if recover failed f2fs: fix data corruption issue with hardware encryption f2fs: fix to recover inode->i_flags of inode block during POR f2fs: spread f2fs_set_inode_flags() f2fs: fix to spread clear_cold_data() Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()" f2fs: account read IOs and use IO counts for is_idle f2fs: fix to account IO correctly for cgroup writeback f2fs: fix to account IO correctly f2fs: remove request_list check in is_idle() f2fs: allow to mount, if quota is failed f2fs: update REQ_TIME in f2fs_cross_rename() f2fs: do not update REQ_TIME in case of error conditions f2fs: remove unneeded disable_nat_bits() f2fs: remove unused sbi->trigger_ssr_threshold f2fs: shrink sbi->sb_lock coverage in set_file_temperature() f2fs: fix to recover cold bit of inode block during POR f2fs: submit cached bio to avoid endless PageWriteback f2fs: checkpoint disabling f2fs: clear PageError on the read path f2fs: allow out-place-update for direct IO in LFS mode f2fs: refactor ->page_mkwrite() flow Revert: "f2fs: check last page index in cached bio to decide submission" f2fs: support superblock checksum f2fs: add to account skip count of background GC f2fs: add to account meta IO f2fs: keep lazytime on remount f2fs: fix missing up_read f2fs: return correct errno in f2fs_gc f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIO f2fs: mark inode dirty explicitly in recover_inode() f2fs: fix to recover inode's crtime during POR f2fs: fix to recover inode's i_gc_failures during POR f2fs: fix to recover inode's i_flags during POR f2fs: fix to recover inode's project id during POR f2fs: update i_size after DIO completion f2fs: report ENOENT correctly in f2fs_rename f2fs: fix remount problem of option io_bits f2fs: fix to recover inode's uid/gid during POR f2fs: avoid infinite loop in f2fs_alloc_nid f2fs: add new idle interval timing for discard and gc paths f2fs: split IO error injection according to RW f2fs: add SPDX license identifiers f2fs: surround fault_injection related option parsing using CONFIG_F2FS_FAULT_INJECTION f2fs: avoid sleeping under spin_lock f2fs: plug readahead IO in readdir() f2fs: fix to do sanity check with current segment number f2fs: fix memory leak of percpu counter in fill_super() f2fs: fix memory leak of write_io in fill_super() f2fs: cache NULL when both default_acl and acl are NULL f2fs: fix to flush all dirty inodes recovered in readonly fs f2fs: report error if quota off error during umount f2fs: submit bio after shutdown f2fs: avoid wrong decrypted data from disk Revert "f2fs: use printk_ratelimited for f2fs_msg" f2fs: fix unnecessary periodic wakeup of discard thread when dev is busy f2fs: fix to avoid NULL pointer dereference on se->discard_map f2fs: add additional sanity check in f2fs_acl_from_disk() Revert "BACKPORT, FROMLIST: fscrypt: add Speck128/256 support" Build fix for076c36fce1. Revert "BACKPORT, FROMGIT: crypto: speck - add support for the Speck block cipher" Revert "FROMGIT: crypto: speck - export common helpers" Revert "BACKPORT, FROMGIT: crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS" Revert "BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck128-XTS" Revert "BACKPORT, FROMGIT: crypto: speck - add test vectors for Speck64-XTS" Revert "BACKPORT, FROMLIST: crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS" Revert "fscrypt: add Speck128/256 support" UPSTREAM: loop: Add LOOP_SET_BLOCK_SIZE in compat ioctl BACKPORT: block/loop: set hw_sectors UPSTREAM: loop: add ioctl for changing logical block size ANDROID: usb: gadget: f_mtp: Return error if count is negative ANDROID: x86_64_cuttlefish_defconfig: disable CONFIG_MEMORY_STATE_TIME ANDROID: sdcardfs: Change current->fs under lock ANDROID: sdcardfs: Don't use OVERRIDE_CRED macro Revert "f2fs: use timespec64 for inode timestamps" ANDROID: restrict store of prefer_idle as boolean BACKPORT: arm/syscalls: Optimize address limit check UPSTREAM: syscalls: Use CHECK_DATA_CORRUPTION for addr_limit_user_check BACKPORT: arm64/syscalls: Check address limit on user-mode return BACKPORT: x86/syscalls: Check address limit on user-mode return BACKPORT: lkdtm: add bad USER_DS test UPSTREAM: bug: switch data corruption check to __must_check BACKPORT: lkdtm: Add tests for struct list corruption UPSTREAM: bug: Provide toggle for BUG on data corruption UPSTREAM: list: Split list_del() debug checking into separate function UPSTREAM: rculist: Consolidate DEBUG_LIST for list_add_rcu() BACKPORT: list: Split list_add() debug checking into separate function FROMLIST: ANDROID: binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl. f2fs: readahead encrypted block during GC f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc f2fs: fix performance issue observed with multi-thread sequential read f2fs: fix to skip verifying block address for non-regular inode f2fs: rework fault injection handling to avoid a warning f2fs: support fault_type mount option f2fs: fix to return success when trimming meta area f2fs: fix use-after-free of dicard command entry f2fs: support discard submission error injection f2fs: split discard command in prior to block layer f2fs: wake up gc thread immediately when gc_urgent is set f2fs: fix incorrect range->len in f2fs_trim_fs() f2fs: refresh recent accessed nat entry in lru list f2fs: fix avoid race between truncate and background GC f2fs: avoid race between zero_range and background GC f2fs: fix to do sanity check with block address in main area v2 f2fs: fix to do sanity check with inline flags f2fs: fix to reset i_gc_failures correctly f2fs: fix invalid memory access f2fs: fix to avoid broken of dnode block list f2fs: use true and false for boolean values f2fs: fix to do sanity check with cp_pack_start_sum f2fs: avoid f2fs_bug_on() in cp_error case f2fs: fix to clear PG_checked flag in set_page_dirty() f2fs: fix to active page in lru list for read path f2fs: don't keep meta pages used for block migration f2fs: fix to restrict mount condition when without CONFIG_QUOTA f2fs: quota: do not mount as RDWR without QUOTA if quota feature enabled f2fs: quota: fix incorrect comments f2fs: add proc entry to show victim_secmap bitmap f2fs: let checkpoint flush dnode page of regular f2fs: issue discard align to section in LFS mode f2fs: don't allow any writes on aborted atomic writes f2fs: restrict setting up inode.i_advise f2fs: fix wrong kernel message when recover fsync data on ro fs f2fs: clean up ioctl interface naming f2fs: clean up with f2fs_is_{atomic,volatile}_file() f2fs: clean up with f2fs_encrypted_inode() f2fs: clean up with get_current_nat_page f2fs: kill EXT_TREE_VEC_SIZE f2fs: avoid duplicated permission check for "trusted." xattrs f2fs: fix to propagate error from __get_meta_page() f2fs: fix to do sanity check with i_extra_isize f2fs: blk_finish_plug of submit_bio in lfs mode f2fs: do not set free of current section f2fs: Keep alloc_valid_block_count in sync f2fs: issue small discard by LBA order f2fs: stop issuing discard immediately if there is queued IO f2fs: clean up with IS_INODE() f2fs: detect bug_on in f2fs_wait_discard_bios f2fs: fix defined but not used build warnings f2fs: enable real-time discard by default f2fs: fix to detect looped node chain correctly f2fs: fix to do sanity check with block address in main area f2fs: fix to skip GC if type in SSA and SIT is inconsistent f2fs: try grabbing node page lock aggressively in sync scenario f2fs: show the fsync_mode=nobarrier mount option f2fs: check the right return value of memory alloc function f2fs: Replace strncpy with memcpy f2fs: avoid the global name 'fault_name' f2fs: fix to do sanity check with reserved blkaddr of inline inode f2fs: fix to do sanity check with node footer and iblocks f2fs: Allocate and stat mem used by free nid bitmap more accurately f2fs: fix to do sanity check with user_block_count f2fs: fix to do sanity check with extra_attr feature f2fs: fix to correct return value of f2fs_trim_fs f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize f2fs: fix to do sanity check with secs_per_zone f2fs: disable f2fs_check_rb_tree_consistence f2fs: introduce and spread verify_blkaddr f2fs: use timespec64 for inode timestamps f2fs: fix to wait on page writeback before updating page f2fs: assign REQ_RAHEAD to bio for ->readpages f2fs: fix a hungtask problem caused by congestion_wait f2fs: Fix uninitialized return in f2fs_ioc_shutdown() f2fs: don't issue discard commands in online discard is on f2fs: fix to propagate return value of scan_nat_page() f2fs: support in-memory inode checksum when checking consistency f2fs: fix error path of fill_super f2fs: relocate readdir_ra configure initialization f2fs: move s_res{u,g}id initialization to default_options() f2fs: don't acquire orphan ino during recovery f2fs: avoid potential deadlock in f2fs_sbi_store f2fs: indicate shutdown f2fs to allow unmount successfully f2fs: keep meta pages in cp_error state f2fs: do checkpoint in kill_sb f2fs: allow wrong configured dio to buffered write f2fs: flush journal nat entries for nat_bits during unmount BACKPORT: arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW ANDROID: arm64: mm: fix 4.4.154 merge BACKPORT: zram: drop max_zpage_size and use zs_huge_class_size() BACKPORT: zsmalloc: introduce zs_huge_class_size() ANDROID: tracing: fix race condition reading saved tgids ANDROID: x86_64_cuttlefish_defconfig: Enable lz4 compression for zram UPSTREAM: drivers/block/zram/zram_drv.c: fix bug storing backing_dev BACKPORT: zram: introduce zram memory tracking BACKPORT: zram: record accessed second BACKPORT: zram: mark incompressible page as ZRAM_HUGE UPSTREAM: zram: correct flag name of ZRAM_ACCESS UPSTREAM: zram: Delete gendisk before cleaning up the request queue UPSTREAM: drivers/block/zram/zram_drv.c: make zram_page_end_io() static BACKPORT: zram: set BDI_CAP_STABLE_WRITES once UPSTREAM: zram: fix null dereference of handle UPSTREAM: zram: add config and doc file for writeback feature BACKPORT: zram: read page from backing device BACKPORT: zram: write incompressible pages to backing device BACKPORT: zram: identify asynchronous IO's return value BACKPORT: zram: add free space management in backing device UPSTREAM: zram: add interface to specif backing device UPSTREAM: zram: rename zram_decompress_page to __zram_bvec_read UPSTREAM: zram: inline zram_compress UPSTREAM: zram: clean up duplicated codes in __zram_bvec_write ANDROID: x86_64_cuttlefish_defconfig: Enable zram and zstd BACKPORT: crypto: zstd - Add zstd support UPSTREAM: zram: add zstd to the supported algorithms list UPSTREAM: lib: Add zstd modules UPSTREAM: lib: Add xxhash module UPSTREAM: zram: rework copy of compressor name in comp_algorithm_store() UPSTREAM: zram: constify attribute_group structures. UPSTREAM: zram: count same page write as page_stored UPSTREAM: zram: reduce load operation in page_same_filled UPSTREAM: zram: use zram_free_page instead of open-coded UPSTREAM: zram: introduce zram data accessor UPSTREAM: zram: remove zram_meta structure UPSTREAM: zram: use zram_slot_lock instead of raw bit_spin_lock op BACKPORT: zram: partial IO refactoring BACKPORT: zram: handle multiple pages attached bio's bvec UPSTREAM: zram: fix operator precedence to get offset BACKPORT: zram: extend zero pages to same element pages BACKPORT: zram: remove waitqueue for IO done UPSTREAM: zram: remove obsolete sysfs attrs UPSTREAM: zram: support BDI_CAP_STABLE_WRITES UPSTREAM: zram: revalidate disk under init_lock BACKPORT: mm: support anonymous stable page UPSTREAM: zram: use __GFP_MOVABLE for memory allocation UPSTREAM: zram: drop gfp_t from zcomp_strm_alloc() UPSTREAM: zram: add more compression algorithms UPSTREAM: zram: delete custom lzo/lz4 UPSTREAM: zram: cosmetic: cleanup documentation UPSTREAM: zram: use crypto api to check alg availability BACKPORT: zram: switch to crypto compress API UPSTREAM: zram: rename zstrm find-release functions UPSTREAM: zram: introduce per-device debug_stat sysfs node UPSTREAM: zram: remove max_comp_streams internals UPSTREAM: zram: user per-cpu compression streams BACKPORT: zsmalloc: require GFP in zs_malloc() UPSTREAM: zram/zcomp: do not zero out zcomp private pages UPSTREAM: zram: pass gfp from zcomp frontend to backend UPSTREAM: socket: close race condition between sock_close() and sockfs_setattr() ANDROID: Refresh x86_64_cuttlefish_defconfig kernel/sys.c: fix merge error with 4.4.144 ANDROID: sdcardfs: Check stacked filesystem depth Fix backport of "tcp: detect malicious patterns in tcp_collapse_ofo_queue()" tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() overflow.h: Add allocation size calculation helpers f2fs: fix to clear FI_VOLATILE_FILE correctly f2fs: let sync node IO interrupt async one f2fs: don't change wbc->sync_mode f2fs: fix to update mtime correctly fs: f2fs: insert space around that ':' and ', ' fs: f2fs: add missing blank lines after declarations fs: f2fs: changed variable type of offset "unsigned" to "loff_t" f2fs: clean up symbol namespace f2fs: make set_de_type() static f2fs: make __f2fs_write_data_pages() static f2fs: fix to avoid accessing cross the boundary f2fs: fix to let caller retry allocating block address disable loading f2fs module on PAGE_SIZE > 4KB f2fs: fix error path of move_data_page f2fs: don't drop dentry pages after fs shutdown f2fs: fix to avoid race during access gc_thread pointer f2fs: clean up with clear_radix_tree_dirty_tag f2fs: fix to don't trigger writeback during recovery f2fs: clear discard_wake earlier f2fs: let discard thread wait a little longer if dev is busy f2fs: avoid stucking GC due to atomic write f2fs: introduce sbi->gc_mode to determine the policy f2fs: keep migration IO order in LFS mode f2fs: fix to wait page writeback during revoking atomic write f2fs: Fix deadlock in shutdown ioctl f2fs: detect synchronous writeback more earlier mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() ceph: use pagevec_lookup_range_nr_tag() mm: add variant of pagevec_lookup_range_tag() taking number of pages mm: use pagevec_lookup_range_tag() in write_cache_pages() mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range() nilfs2: use pagevec_lookup_range_tag() gfs2: use pagevec_lookup_range_tag() f2fs: use find_get_pages_tag() for looking up single page f2fs: simplify page iteration loops f2fs: use pagevec_lookup_range_tag() ext4: use pagevec_lookup_range_tag() ceph: use pagevec_lookup_range_tag() btrfs: use pagevec_lookup_range_tag() mm: implement find_get_pages_range_tag() f2fs: clean up with is_valid_blkaddr() f2fs: fix to initialize min_mtime with ULLONG_MAX f2fs: fix to let checkpoint guarantee atomic page persistence f2fs: fix to initialize i_current_depth according to inode type Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc" f2fs: don't drop any page on f2fs_cp_error() case f2fs: fix spelling mistake: "extenstion" -> "extension" f2fs: enhance sanity_check_raw_super() to avoid potential overflows f2fs: treat volatile file's data as hot one f2fs: introduce release_discard_addr() for cleanup f2fs: fix potential overflow f2fs: rename dio_rwsem to i_gc_rwsem f2fs: move mnt_want_write_file after range check f2fs: fix missing clear FI_NO_PREALLOC in some error case f2fs: enforce fsync_mode=strict for renamed directory f2fs: sanity check for total valid node blocks f2fs: sanity check on sit entry f2fs: avoid bug_on on corrupted inode f2fs: give message and set need_fsck given broken node id f2fs: clean up commit_inmem_pages() f2fs: do not check F2FS_INLINE_DOTS in recover f2fs: remove duplicated dquot_initialize and fix error handling f2fs: stop issue discard if something wrong with f2fs f2fs: fix return value in f2fs_ioc_commit_atomic_write f2fs: allocate hot_data for atomic write more strictly f2fs: check if inmem_pages list is empty correctly f2fs: fix race in between GC and atomic open f2fs: change le32 to le16 of f2fs_inode->i_extra_size f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries f2fs: correct return value of f2fs_trim_fs f2fs: fix to show missing bits in FS_IOC_GETFLAGS f2fs: remove unneeded F2FS_PROJINHERIT_FL f2fs: don't use GFP_ZERO for page caches f2fs: issue all big range discards in umount process f2fs: remove redundant block plug f2fs: remove unmatched zero_user_segment when convert inline dentry f2fs: introduce private inode status mapping fscrypt: log the crypto algorithm implementations crypto: api - Add crypto_type_has_alg helper crypto: skcipher - Add low-level skcipher interface crypto: skcipher - Add helper to retrieve driver name crypto: skcipher - Add default key size helper fscrypt: add Speck128/256 support fscrypt: only derive the needed portion of the key fscrypt: separate key lookup from key derivation fscrypt: use a common logging function fscrypt: remove internal key size constants fscrypt: remove unnecessary check for non-logon key type fscrypt: make fscrypt_operations.max_namelen an integer fscrypt: drop empty name check from fname_decrypt() fscrypt: drop max_namelen check from fname_decrypt() fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info() fscrypt: don't clear flags on crypto transform fscrypt: remove stale comment from fscrypt_d_revalidate() fscrypt: remove error messages for skcipher_request_alloc() failure fscrypt: remove unnecessary NULL check when allocating skcipher fscrypt: clean up after fscrypt_prepare_lookup() conversions fscrypt: use unbound workqueue for decryption f2fs: run fstrim asynchronously if runtime discard is on f2fs: turn down IO priority of discard from background f2fs: don't split checkpoint in fstrim f2fs: issue discard commands proactively in high fs utilization f2fs: add fsync_mode=nobarrier for non-atomic files f2fs: let fstrim issue discard commands in lower priority f2fs: avoid fsync() failure caused by EAGAIN in writepage() f2fs: clear PageError on writepage - part 2 f2fs: check cap_resource only for data blocks Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" f2fs: clear PageError on writepage f2fs: call unlock_new_inode() before d_instantiate() f2fs: refactor read path to allow multiple postprocessing steps fscrypt: allow synchronous bio decryption f2fs: remain written times to update inode during fsync f2fs: make assignment of t->dentry_bitmap more readable f2fs: truncate preallocated blocks in error case f2fs: fix a wrong condition in f2fs_skip_inode_update f2fs: reserve bits for fs-verity f2fs: Add a segment type check in inplace write f2fs: no need to initialize zero value for GFP_F2FS_ZERO f2fs: don't track new nat entry in nat set f2fs: clean up with F2FS_BLK_ALIGN f2fs: check blkaddr more accuratly before issue a bio f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read f2fs: introduce a new mount option test_dummy_encryption f2fs: introduce F2FS_FEATURE_LOST_FOUND feature f2fs: release locks before return in f2fs_ioc_gc_range() f2fs: align memory boundary for bitops f2fs: remove unneeded set_cold_node() f2fs: add nowait aio support f2fs: wrap all options with f2fs_sb_info.mount_opt f2fs: Don't overwrite all types of node to keep node chain f2fs: introduce mount option for fsync mode f2fs: fix to restore old mount option in ->remount_fs f2fs: wrap sb_rdonly with f2fs_readonly f2fs: avoid selinux denial on CAP_SYS_RESOURCE f2fs: support hot file extension f2fs: fix to avoid race in between atomic write and background GC f2fs: do gc in greedy mode for whole range if gc_urgent mode is set f2fs: issue discard aggressively in the gc_urgent mode f2fs: set readdir_ra by default f2fs: add auto tuning for small devices f2fs: add mount option for segment allocation policy f2fs: don't stop GC if GC is contended f2fs: expose extension_list sysfs entry f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range f2fs: introduce sb_lock to make encrypt pwsalt update exclusive f2fs: remove redundant initialization of pointer 'p' f2fs: flush cp pack except cp pack 2 page at first f2fs: clean up f2fs_sb_has_xxx functions f2fs: remove redundant check of page type when submit bio f2fs: fix to handle looped node chain during recovery f2fs: handle quota for orphan inodes f2fs: support passing down write hints to block layer with F2FS policy f2fs: support passing down write hints given by users to block layer f2fs: fix to clear CP_TRIMMED_FLAG f2fs: support large nat bitmap f2fs: fix to check extent cache in f2fs_drop_extent_tree f2fs: restrict inline_xattr_size configuration f2fs: fix heap mode to reset it back f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET fscrypt: fix build with pre-4.6 gcc versions fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h fscrypt: move fscrypt_operations declaration to fscrypt_supp.h fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h fscrypt: move fscrypt_control_page() to supp/notsupp headers fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers f2fs: don't put dentry page in pagecache into highmem f2fs: support inode creation time f2fs: rebuild sit page from sit info in mem f2fs: stop issuing discard if fs is readonly f2fs: clean up duplicated assignment in init_discard_policy f2fs: use GFP_F2FS_ZERO for cleanup f2fs: allow to recover node blocks given updated checkpoint f2fs: recover some i_inline flags f2fs: correct removexattr behavior for null valued extended attribute f2fs: drop page cache after fs shutdown f2fs: stop gc/discard thread after fs shutdown f2fs: hanlde error case in f2fs_ioc_shutdown f2fs: split need_inplace_update f2fs: fix to update last_disk_size correctly f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup f2fs: clean up error path of fill_super f2fs: avoid hungtask when GC encrypted block if io_bits is set f2fs: allow quota to use reserved blocks f2fs: fix to drop all inmem pages correctly f2fs: speed up defragment on sparse file f2fs: support F2FS_IOC_PRECACHE_EXTENTS f2fs: add an ioctl to disable GC for specific file f2fs: prevent newly created inode from being dirtied incorrectly f2fs: support FIEMAP_FLAG_XATTR f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock f2fs: check node page again in write end io f2fs: fix to caclulate required free section correctly f2fs: handle newly created page when revoking inmem pages f2fs: add resgid and resuid to reserve root blocks f2fs: implement cgroup writeback support f2fs: remove unused pend_list_tag f2fs: avoid high cpu usage in discard thread f2fs: make local functions static f2fs: add reserved blocks for root user f2fs: check segment type in __f2fs_replace_block f2fs: update inode info to inode page for new file f2fs: show precise # of blocks that user/root can use f2fs: clean up unneeded declaration f2fs: continue to do direct IO if we only preallocate partial blocks f2fs: enable quota at remount from r to w f2fs: skip stop_checkpoint for user data writes f2fs: fix missing error number for xattr operation f2fs: recover directory operations by fsync f2fs: return error during fill_super f2fs: fix an error case of missing update inode page f2fs: fix potential hangtask in f2fs_trace_pid f2fs: no need return value in restore summary process f2fs: use unlikely for release case f2fs: don't return value in truncate_data_blocks_range f2fs: clean up f2fs_map_blocks f2fs: clean up hash codes f2fs: fix error handling in fill_super f2fs: spread f2fs_k{m,z}alloc f2fs: inject fault to kvmalloc f2fs: inject fault to kzalloc f2fs: remove a redundant conditional expression f2fs: apply write hints to select the type of segment for direct write f2fs: switch to fscrypt_prepare_setattr() f2fs: switch to fscrypt_prepare_lookup() f2fs: switch to fscrypt_prepare_rename() f2fs: switch to fscrypt_prepare_link() f2fs: switch to fscrypt_file_open() f2fs: remove repeated f2fs_bug_on f2fs: remove an excess variable f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem f2fs: remove unused parameter f2fs: still write data if preallocate only partial blocks f2fs: introduce sysfs readdir_ra to readahead inode block in readdir f2fs: fix concurrent problem for updating free bitmap f2fs: remove unneeded memory footprint accounting f2fs: no need to read nat block if nat_block_bitmap is set f2fs: reserve nid resource for quota sysfile fscrypt: resolve some cherry-pick bugs fscrypt: move to generic async completion crypto: introduce crypto wait for async op fscrypt: lock mutex before checking for bounce page pool fscrypt: new helper function - fscrypt_prepare_setattr() fscrypt: new helper function - fscrypt_prepare_lookup() fscrypt: new helper function - fscrypt_prepare_rename() fscrypt: new helper function - fscrypt_prepare_link() fscrypt: new helper function - fscrypt_file_open() fscrypt: new helper function - fscrypt_require_key() fscrypt: remove unneeded empty fscrypt_operations structs fscrypt: remove ->is_encrypted() fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() fs, fscrypt: add an S_ENCRYPTED inode flag fscrypt: clean up include file mess fscrypt: fix dereference of NULL user_key_payload fscrypt: make ->dummy_context() return bool f2fs: deny accessing encryption policy if encryption is off f2fs: inject fault in inc_valid_node_count f2fs: fix to clear FI_NO_PREALLOC f2fs: expose quota information in debugfs f2fs: separate nat entry mem alloc from nat_tree_lock f2fs: validate before set/clear free nat bitmap f2fs: avoid opened loop codes in __add_ino_entry f2fs: apply write hints to select the type of segments for buffered write f2fs: introduce scan_curseg_cache for cleanup f2fs: optimize the way of traversing free_nid_bitmap f2fs: keep scanning until enough free nids are acquired f2fs: trace checkpoint reason in fsync() f2fs: keep isize once block is reserved cross EOF f2fs: avoid race in between GC and block exchange f2fs: save a multiplication for last_nid calculation f2fs: fix summary info corruption f2fs: remove dead code in update_meta_page f2fs: remove unneeded semicolon f2fs: don't bother with inode->i_version f2fs: check curseg space before foreground GC f2fs: use rw_semaphore to protect SIT cache f2fs: support quota sys files f2fs: add quota_ino feature infra f2fs: optimize __update_nat_bits f2fs: modify for accurate fggc node io stat Revert "f2fs: handle dirty segments inside refresh_sit_entry" f2fs: add a function to move nid f2fs: export SSR allocation threshold f2fs: give correct trimmed blocks in fstrim f2fs: support bio allocation error injection f2fs: support get_page error injection f2fs: add missing sysfs description f2fs: support soft block reservation f2fs: handle error case when adding xattr entry f2fs: support flexible inline xattr size f2fs: show current cp state f2fs: add missing quota_initialize f2fs: show # of dirty segments via sysfs f2fs: stop all the operations by cp_error flag f2fs: remove several redundant assignments f2fs: avoid using timespec f2fs: fix to correct no_fggc_candidate Revert "f2fs: return wrong error number on f2fs_quota_write" f2fs: remove obsolete pointer for truncate_xattr_node f2fs: retry ENOMEM for quota_read|write f2fs: limit # of inmemory pages f2fs: update ctx->pos correctly when hitting hole in directory f2fs: relocate readahead codes in readdir() f2fs: allow readdir() to be interrupted f2fs: trace f2fs_readdir f2fs: trace f2fs_lookup f2fs: skip searching non-exist range in truncate_hole f2fs: expose some sectors to user in inline data or dentry case f2fs: avoid stale fi->gdirty_list pointer f2fs/crypto: drop crypto key at evict_inode only f2fs: fix to avoid race when accessing last_disk_size f2fs: Fix bool initialization/comparison f2fs: give up CP_TRIMMED_FLAG if it drops discards f2fs: trace f2fs_remove_discard f2fs: reduce cmd_lock coverage in __issue_discard_cmd f2fs: split discard policy f2fs: wrap discard policy f2fs: support issuing/waiting discard in range f2fs: fix to flush multiple device in checkpoint f2fs: enhance multiple device flush f2fs: fix to show ino management cache size correctly f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush f2fs: obsolete ALLOC_NID_LIST list f2fs: convert inline data for direct I/O & FI_NO_PREALLOC f2fs: allow readpages with NULL file pointer f2fs: show flush list status in sysfs f2fs: introduce read_xattr_block f2fs: introduce read_inline_xattr Revert "f2fs: reuse nids more aggressively" Revert "f2fs: node segment is prior to data segment selected victim" f2fs: fix potential panic during fstrim f2fs: hurry up to issue discard after io interruption f2fs: fix to show correct discard_granularity in sysfs f2fs: detect dirty inode in evict_inode f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared f2fs: speed up gc_urgent mode with SSR f2fs: better to wait for fstrim completion f2fs: avoid race in between read xattr & write xattr f2fs: make get_lock_data_page to handle encrypted inode f2fs: use generic terms used for encrypted block management f2fs: introduce f2fs_encrypted_file for clean-up Revert "f2fs: add a new function get_ssr_cost" f2fs: constify super_operations f2fs: fix to wake up all sleeping flusher f2fs: avoid race in between atomic_read & atomic_inc f2fs: remove unneeded parameter of change_curseg f2fs: update i_flags correctly f2fs: don't check inode's checksum if it was dirtied or writebacked f2fs: don't need to update inode checksum for recovery f2fs: trigger fdatasync for non-atomic_write file f2fs: fix to avoid race in between aio and gc f2fs: wake up discard_thread iff there is a candidate f2fs: return error when accessing insane flie offset f2fs: trigger normal fsync for non-atomic_write file f2fs: clear FI_HOT_DATA correctly f2fs: fix out-of-order execution in f2fs_issue_flush f2fs: issue discard commands if gc_urgent is set f2fs: introduce discard_granularity sysfs entry f2fs: remove unused function overprovision_sections f2fs: check hot_data for roll-forward recovery f2fs: add tracepoint for f2fs_gc f2fs: retry to revoke atomic commit in -ENOMEM case f2fs: let fill_super handle roll-forward errors f2fs: merge equivalent flags F2FS_GET_BLOCK_[READ|DIO] f2fs: support journalled quota f2fs: fix potential overflow when adjusting GC cycle f2fs: avoid unneeded sync on quota file f2fs: introduce gc_urgent mode for background GC f2fs: use IPU for cold files f2fs: fix the size value in __check_sit_bitmap f2fs: add app/fs io stat f2fs: do not change the valid_block value if cur_valid_map was wrongly set or cleared f2fs: update cur_valid_map_mir together with cur_valid_map f2fs: use printk_ratelimited for f2fs_msg f2fs: expose features to sysfs entry f2fs: support inode checksum f2fs: return wrong error number on f2fs_quota_write f2fs: provide f2fs_balance_fs to __write_node_page f2fs: introduce f2fs_statfs_project f2fs: don't need to wait for node writes for atomic write f2fs: avoid naming confusion of sysfs init f2fs: support project quota f2fs: record quota during dot{,dot} recovery f2fs: enhance on-disk inode structure scalability f2fs: make max inline size changeable f2fs: add ioctl to expose current features f2fs: make background threads of f2fs being aware of freezing f2fs: don't give partially written atomic data from process crash f2fs: give a try to do atomic write in -ENOMEM case f2fs: preserve i_mode if __f2fs_set_acl() fails f2fs: alloc new nids for xattr block in recovery f2fs: spread struct f2fs_dentry_ptr for inline path f2fs: remove unused input parameter f2fs: avoid cpu lockup f2fs: include seq_file.h for sysfs.c f2fs: Don't clear SGID when inheriting ACLs f2fs: remove extra inode_unlock() in error path fscrypt: add support for AES-128-CBC fscrypt: inline fscrypt_free_filename() f2fs: make more close to v4.13-rc1 f2fs: support plain user/group quota f2fs: avoid deadlock caused by lock order of page and lock_op f2fs: use spin_{,un}lock_irq{save,restore} f2fs: relax migratepage for atomic written page f2fs: don't count inode block in in-memory inode.i_blocks Revert "f2fs: fix to clean previous mount option when remount_fs" f2fs: do not set LOST_PINO for renamed dir f2fs: do not set LOST_PINO for newly created dir f2fs: skip ->writepages for {mete,node}_inode during recovery f2fs: introduce __check_sit_bitmap f2fs: stop gc/discard thread in prior during umount f2fs: introduce reserved_blocks in sysfs f2fs: avoid redundant f2fs_flush after remount f2fs: report # of free inodes more precisely f2fs: add ioctl to do gc with target block address f2fs: don't need to check encrypted inode for partial truncation f2fs: measure inode.i_blocks as generic filesystem f2fs: set CP_TRIMMED_FLAG correctly f2fs: require key for truncate(2) of encrypted file f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c f2fs: clean up sysfs codes f2fs: fix wrong error number of fill_super f2fs: fix to show injection rate in ->show_options f2fs: Fix a return value in case of error in 'f2fs_fill_super' f2fs: use proper variable name f2fs: fix to avoid panic when encountering corrupt node f2fs: don't track newly allocated nat entry in list f2fs: add f2fs_bug_on in __remove_discard_cmd f2fs: introduce __wait_one_discard_bio f2fs: dax: fix races between page faults and truncating pages f2fs: simplify the way of calulating next nat address f2fs: sanity check size of nat and sit cache f2fs: fix a panic caused by NULL flush_cmd_control f2fs: remove the unnecessary cast for PTR_ERR f2fs: remove false-positive bug_on f2fs: Do not issue small discards in LFS mode f2fs: don't bother checking for encryption key in ->write_iter() f2fs: don't bother checking for encryption key in ->mmap() f2fs: wait discard IO completion without cmd_lock held f2fs: wake up all waiters in f2fs_submit_discard_endio f2fs: show more info if fail to issue discard f2fs: introduce io_list for serialize data/node IOs f2fs: split wio_mutex f2fs: combine huge num of discard rb tree consistence checks f2fs: fix a bug caused by NULL extent tree f2fs: try to freeze in gc and discard threads f2fs: add a new function get_ssr_cost f2fs: declare load_free_nid_bitmap static f2fs: avoid f2fs_lock_op for IPU writes f2fs: split bio cache f2fs: use fio instead of multiple parameters f2fs: remove unnecessary read cases in merged IO flow f2fs: use f2fs_submit_page_bio for ra_meta_pages f2fs: make sure f2fs_gc returns consistent errno f2fs: load inode's flag from disk f2fs: sanity check checkpoint segno and blkoff f2fs, block_dump: give WRITE direction to submit_bio fscrypt: correct collision claim for digested names f2fs: switch to using fscrypt_match_name() fscrypt: introduce helper function for filename matching fscrypt: fix context consistency check when key(s) unavailable fscrypt: Move key structure and constants to uapi fscrypt: remove fscrypt_symlink_data_len() fscrypt: remove unnecessary checks for NULL operations fscrypt: eliminate ->prepare_context() operation fscrypt: remove broken support for detecting keyring key revocation fscrypt: avoid collisions when presenting long encrypted filenames f2fs: check entire encrypted bigname when finding a dentry f2fs: sync f2fs_lookup() with ext4_lookup() f2fs: fix a mount fail for wrong next_scan_nid f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS f2fs: show available_nids in f2fs/status f2fs: flush dirty nats periodically f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard f2fs: allow cpc->reason to indicate more than one reason f2fs: release cp and dnode lock before IPU f2fs: shrink size of struct discard_cmd f2fs: don't hold cmd_lock during waiting discard command f2fs: nullify fio->encrypted_page for each writes f2fs: sanity check segment count f2fs: introduce valid_ipu_blkaddr to clean up f2fs: lookup extent cache first under IPU scenario f2fs: reconstruct code to write a data page f2fs: introduce __wait_discard_cmd f2fs: introduce __issue_discard_cmd f2fs: enable small discard by default f2fs: delay awaking discard thread f2fs: seperate read nat page from nat_tree_lock f2fs: fix multiple f2fs_add_link() having same name for inline dentry f2fs: skip encrypted inode in ASYNC IPU policy f2fs: fix out-of free segments f2fs: improve definition of statistic macros f2fs: assign allocation hint for warm/cold data f2fs: fix _IOW usage f2fs: add ioctl to flush data from faster device to cold area f2fs: introduce async IPU policy f2fs: add undiscard blocks stat f2fs: unlock cp_rwsem early for IPU writes f2fs: introduce __check_rb_tree_consistence f2fs: trace __submit_discard_cmd f2fs: in prior to issue big discard f2fs: clean up discard_cmd_control structure f2fs: use rb-tree to track pending discard commands f2fs: avoid dirty node pages in check_only recovery f2fs: fix not to set fsync/dentry mark f2fs: allocate hot_data for atomic writes f2fs: give time to flush dirty pages for checkpoint f2fs: fix fs corruption due to zero inode page f2fs: shrink blk plug region f2fs: extract rb-tree operation infrastructure f2fs: avoid frequent checkpoint during f2fs_gc f2fs: clean up some macros in terms of GET_SEGNO f2fs: clean up get_valid_blocks with consistent parameter f2fs: use segment number for get_valid_blocks f2fs: guard macro variables with braces f2fs: fix comment on f2fs_flush_merged_bios() after86531d6bf2fs: prevent waiter encountering incorrect discard states f2fs: introduce f2fs_wait_discard_bios f2fs: split discard_cmd_list Revert "f2fs: put allocate_segment after refresh_sit_entry" f2fs: split make_dentry_ptr() into block and inline versions f2fs: submit bio of in-place-update pages f2fs: remove the redundant variable definition f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE f2fs: write small sized IO to hot log f2fs: use bitmap in discard_entry f2fs: clean up destroy_discard_cmd_control f2fs: count discard command entry f2fs: show issued flush/discard count f2fs: relax node version check for victim data in gc f2fs: start SSR much eariler to avoid FG_GC f2fs: allocate node and hot data in the beginning of partition f2fs: fix wrong max cost initialization f2fs: allow write page cache when writting cp f2fs: don't reserve additional space in xattr block f2fs: clean up xattr operation f2fs: don't track volatile file in dirty inode list f2fs: show the max number of volatile operations f2fs: fix race condition in between free nid allocator/initializer f2fs: use set_page_private marcro in f2fs_trace_pid f2fs: fix recording invalid last_victim f2fs: more reasonable mem_size calculating of ino_entry f2fs: calculate the f2fs_stat_info into base_mem f2fs: avoid stat_inc_atomic_write for non-atomic file f2fs: sanity check of crc_offset from raw checkpoint f2fs: cleanup the disk level filename updating f2fs: cover update_free_nid_bitmap with nid_list_lock f2fs: fix bad prefetchw of NULL page f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode f2fs: move mnt_want_write_file after arguments checking f2fs: check new size by inode_newsize_ok in f2fs_insert_range f2fs: avoid copy date to user-space if move file range fail f2fs: drop duplicate new_size assign in f2fs_zero_range f2fs: adjust the way of calculating nat block f2fs: add fault injection on f2fs_truncate f2fs: check range before defragment f2fs: use parameter max_items instead of PIDVEC_SIZE f2fs: add a punch discard command function f2fs: allocate a bio for discarding when actually issuing it f2fs: skip writeback meta pages if cp_mutex acquire failed f2fs: show more precise message on orphan recovery failure f2fs: remove dead macro PGOFS_OF_NEXT_DNODE f2fs: drop duplicate radix tree lookup of nat_entry_set f2fs: make sure trace all f2fs_issue_flush f2fs: don't allow volatile writes for non-regular file f2fs: don't allow atomic writes for not regular files f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer f2fs: build stat_info before orphan inode recovery f2fs: fix the fault of calculating blkstart twice f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode f2fs: don't allow to get pino when filename is encrypted f2fs: fix wrong error injection for evict_inode f2fs: le32_to_cpu for ckpt->cp_pack_total_block_count f2fs: le16_to_cpu for xattr->e_value_size f2fs: don't need to invalidate wrong node page f2fs: fix an error return value in truncate_partial_data_page f2fs: combine nat_bits and free_nid_bitmap cache f2fs: skip scanning free nid bitmap of full NAT blocks f2fs: use __set{__clear}_bit_le f2fs: update_free_nid_bitmap() can be static f2fs: __update_nat_bits() can be static f2fs: le16_to_cpu for xattr->e_value_size f2fs: don't overwrite node block by SSR f2fs: don't need to invalidate wrong node page f2fs: fix an error return value in truncate_partial_data_page fscrypt: catch up to v4.11-rc1 f2fs: avoid to flush nat journal entries f2fs: avoid to issue redundant discard commands f2fs: fix a plint compile warning f2fs: add f2fs_drop_inode tracepoint f2fs: Fix zoned block device support f2fs: remove redundant set_page_dirty() f2fs: fix to enlarge size of write_io_dummy mempool f2fs: fix memory leak of write_io_dummy mempool during umount f2fs: fix to update F2FS_{CP_}WB_DATA count correctly f2fs: use MAX_FREE_NIDS for the free nids target f2fs: introduce free nid bitmap f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint f2fs: update the comment of default nr_pages to skipping f2fs: drop the duplicate pval in f2fs_getxattr f2fs: Don't update the xattr data that same as the exist f2fs: kill __is_extent_same f2fs: avoid bggc->fggc when enough free segments are avaliable after cp f2fs: select target segment with closer temperature in SSR mode f2fs: show simple call stack in fault injection message fscrypt: catch fscrypto_get_policy in v4.10-rc6 f2fs: use __clear_bit_le f2fs: no need lock_op in f2fs_write_inline_data f2fs: add bitmaps for empty or full NAT blocks f2fs: replace rw semaphore extent_tree_lock with mutex lock f2fs: avoid m_flags overlay when allocating more data blocks f2fs: remove unsafe bitmap checking f2fs: init local extent_info to avoid stale stack info in tp f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc f2fs: do SSR for node segments more aggresively f2fs: check discard alignment only for SEQWRITE zones f2fs: wait for discard completion after submission f2fs: much larger batched trim_fs job f2fs: avoid very large discard command f2fs: find data segments across all the types f2fs: do SSR in higher priority f2fs: do SSR for data when there is enough free space f2fs: node segment is prior to data segment selected victim f2fs: put allocate_segment after refresh_sit_entry f2fs: add ovp valid_blocks check for bg gc victim to fg_gc f2fs: do not wait for writeback in write_begin f2fs: replace __get_victim by dirty_segments in FG_GC f2fs: fix multiple f2fs_add_link() calls having same name f2fs: show actual device info in tracepoints f2fs: use SSR for warm node as well f2fs: enable inline_xattr by default f2fs: introduce noinline_xattr mount option f2fs: avoid reading NAT page by get_node_info f2fs: remove build_free_nids() during checkpoint f2fs: change recovery policy of xattr node block f2fs: super: constify fscrypt_operations structure f2fs: show checkpoint version at mount time f2fs: remove preflush for nobarrier case f2fs: check last page index in cached bio to decide submission f2fs: check io submission more precisely f2fs: fix trim_fs assignment Revert "f2fs: remove batched discard in f2fs_trim_fs" f2fs: fix missing bio_alloc(1) f2fs: call internal __write_data_page directly f2fs: avoid out-of-order execution of atomic writes f2fs: move write_node_page above fsync_node_pages f2fs: move flush tracepoint f2fs: show # of APPEND and UPDATE inodes f2fs: fix 446 coding style warnings in f2fs.h f2fs: fix 3 coding style errors in f2fs.h f2fs: declare missing static function f2fs: show the fault injection mount option f2fs: fix null pointer dereference when issuing flush in ->fsync f2fs: fix to avoid overflow when left shifting page offset f2fs: enhance lookup xattr f2fs: fix a dead loop in f2fs_fiemap() f2fs: do not preallocate blocks which has wrong buffer f2fs: show # of on-going flush and discard bios f2fs: add a kernel thread to issue discard commands asynchronously f2fs: factor out discard command info into discard_cmd_control f2fs: remove batched discard in f2fs_trim_fs f2fs: reorganize stat information f2fs: clean up flush/discard command namings f2fs: check in-memory sit version bitmap f2fs: check in-memory nat version bitmap f2fs: check in-memory block bitmap f2fs: introduce FI_ATOMIC_COMMIT f2fs: clean up with list_{first, last}_entry f2fs: return fs_trim if there is no candidate f2fs: avoid needless checkpoint in f2fs_trim_fs f2fs: relax async discard commands more f2fs: drop exist_data for inline_data when truncated to 0 f2fs: don't allow encrypted operations without keys f2fs: show the max number of atomic operations f2fs: get io size bit from mount option f2fs: support IO alignment for DATA and NODE writes f2fs: add submit_bio tracepoint f2fs: reassign new segment for mode=lfs f2fs: fix a missing discard prefree segments f2fs: use rb_entry_safe f2fs: add a case of no need to read a page in write begin f2fs: fix a problem of using memory after free f2fs: remove unneeded condition f2fs: don't cache nat entry if out of memory f2fs: remove unused values in recover_fsync_data f2fs: support async discard based on v4.9 f2fs: resolve op and op_flags confilcts f2fs: remove wrong backported codes f2fs: fix a missing size change in f2fs_setattr fs/super.c: fix race between freeze_super() and thaw_super() scripts/tags.sh: catch 4.9-rc6 f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting f2fs: avoid BG_GC in f2fs_balance_fs f2fs: fix redundant block allocation f2fs: use err for f2fs_preallocate_blocks f2fs: support multiple devices f2fs: allow dio read for LFS mode f2fs: revert segment allocation for direct IO f2fs: return directly if block has been removed from the victim Revert "f2fs: do not recover from previous remained wrong dnodes" f2fs: remove checkpoint in f2fs_freeze f2fs: assign segments correctly for direct_io f2fs: fix wrong i_atime recovery f2fs: record inode updating status correctly f2fs: Trace reset zone events f2fs: Reset sequential zones on zoned block devices f2fs: Cache zoned block devices zone type f2fs: Do not allow adaptive mode for host-managed zoned block devices f2fs: Always enable discard for zoned blocks devices f2fs: Suppress discard warning message for zoned block devices f2fs: Check zoned block feature for host-managed zoned block devices f2fs: Use generic zoned block device terminology f2fs: Add missing break in switch-case f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes f2fs: report error of f2fs_fill_dentries fs/crypto: catch up 4.9-rc6 f2fs: hide a maybe-uninitialized warning f2fs: remove percpu_count due to performance regression f2fs: make clean inodes when flushing inode page f2fs: keep dirty inodes selectively for checkpoint f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps f2fs: use BIO_MAX_PAGES for bio allocation f2fs: declare static function for __build_free_nids f2fs: call f2fs_balance_fs for setattr f2fs: count dirty inodes to flush node pages during checkpoint f2fs: avoid casted negative value as shrink count f2fs: don't interrupt free nids building during nid allocation f2fs: clean up free nid list operations f2fs: split free nid list f2fs: clear nlink if fail to add_link f2fs: fix sparse warnings f2fs: fix error handling in fsync_node_pages f2fs: fix to update largest extent under lock f2fs: be aware of extent beyond EOF in fiemap f2fs: don't miss any f2fs_balance_fs cases f2fs: add missing f2fs_balance_fs in f2fs_zero_range f2fs: give a chance to detach from dirty list f2fs: fix to release discard entries during checkpoint f2fs: exclude free nids building and allocation f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix overflow due to condition check order posix_acl: Clear SGID bit when setting file permissions f2fs: fix wrong sum_page pointer in f2fs_gc f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Conflicts: Makefile arch/arm64/configs/cuttlefish_defconfig arch/arm64/kernel/vdso.c arch/arm64/kernel/vdso/gettimeofday.S arch/x86/configs/x86_64_cuttlefish_defconfig build.config.cuttlefish.aarch64 build.config.cuttlefish.x86_64 drivers/base/power/main.c drivers/block/loop.c drivers/block/zram/zram_drv.c drivers/block/zram/zram_drv.h drivers/net/wireless/virt_wifi.c drivers/staging/android/lowmemorykiller.c drivers/vhost/vsock.c fs/ext4/crypto.c fs/ext4/crypto_key.c fs/f2fs/checkpoint.c fs/f2fs/data.c fs/f2fs/dir.c fs/f2fs/f2fs.h fs/f2fs/file.c fs/f2fs/inline.c fs/f2fs/inode.c fs/f2fs/node.c fs/f2fs/recovery.c fs/f2fs/segment.c fs/f2fs/segment.h fs/f2fs/super.c fs/overlayfs/inode.c fs/sdcardfs/super.c fs/squashfs/block.c fs/squashfs/lz4_wrapper.c include/linux/bug.h include/linux/f2fs_fs.h include/linux/msm_mdp.h include/linux/swap.h include/linux/virtio_vsock.h kernel/cpu.c kernel/trace/trace.c lib/Kconfig.debug lib/list_debug.c mm/zsmalloc.c net/vmw_vsock/virtio_transport_common.c Change-Id: Ic475eba811d0e6973a374771d68f31b15865efcc
2080 lines
54 KiB
C
2080 lines
54 KiB
C
/*
|
|
* (C) 1997 Linus Torvalds
|
|
* (C) 1999 Andrea Arcangeli <andrea@suse.de> (dynamic inode allocation)
|
|
*/
|
|
#include <linux/export.h>
|
|
#include <linux/fs.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/backing-dev.h>
|
|
#include <linux/hash.h>
|
|
#include <linux/swap.h>
|
|
#include <linux/security.h>
|
|
#include <linux/cdev.h>
|
|
#include <linux/bootmem.h>
|
|
#include <linux/fsnotify.h>
|
|
#include <linux/mount.h>
|
|
#include <linux/posix_acl.h>
|
|
#include <linux/prefetch.h>
|
|
#include <linux/buffer_head.h> /* for inode_has_buffers */
|
|
#include <linux/ratelimit.h>
|
|
#include <linux/list_lru.h>
|
|
#include <trace/events/writeback.h>
|
|
#include "internal.h"
|
|
|
|
/*
|
|
* Inode locking rules:
|
|
*
|
|
* inode->i_lock protects:
|
|
* inode->i_state, inode->i_hash, __iget()
|
|
* Inode LRU list locks protect:
|
|
* inode->i_sb->s_inode_lru, inode->i_lru
|
|
* inode->i_sb->s_inode_list_lock protects:
|
|
* inode->i_sb->s_inodes, inode->i_sb_list
|
|
* bdi->wb.list_lock protects:
|
|
* bdi->wb.b_{dirty,io,more_io,dirty_time}, inode->i_io_list
|
|
* inode_hash_lock protects:
|
|
* inode_hashtable, inode->i_hash
|
|
*
|
|
* Lock ordering:
|
|
*
|
|
* inode->i_sb->s_inode_list_lock
|
|
* inode->i_lock
|
|
* Inode LRU list locks
|
|
*
|
|
* bdi->wb.list_lock
|
|
* inode->i_lock
|
|
*
|
|
* inode_hash_lock
|
|
* inode->i_sb->s_inode_list_lock
|
|
* inode->i_lock
|
|
*
|
|
* iunique_lock
|
|
* inode_hash_lock
|
|
*/
|
|
|
|
static unsigned int i_hash_mask __read_mostly;
|
|
static unsigned int i_hash_shift __read_mostly;
|
|
static struct hlist_head *inode_hashtable __read_mostly;
|
|
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
|
|
|
|
/*
|
|
* Empty aops. Can be used for the cases where the user does not
|
|
* define any of the address_space operations.
|
|
*/
|
|
const struct address_space_operations empty_aops = {
|
|
};
|
|
EXPORT_SYMBOL(empty_aops);
|
|
|
|
/*
|
|
* Statistics gathering..
|
|
*/
|
|
struct inodes_stat_t inodes_stat;
|
|
|
|
static DEFINE_PER_CPU(unsigned long, nr_inodes);
|
|
static DEFINE_PER_CPU(unsigned long, nr_unused);
|
|
|
|
static struct kmem_cache *inode_cachep __read_mostly;
|
|
|
|
static long get_nr_inodes(void)
|
|
{
|
|
int i;
|
|
long sum = 0;
|
|
for_each_possible_cpu(i)
|
|
sum += per_cpu(nr_inodes, i);
|
|
return sum < 0 ? 0 : sum;
|
|
}
|
|
|
|
static inline long get_nr_inodes_unused(void)
|
|
{
|
|
int i;
|
|
long sum = 0;
|
|
for_each_possible_cpu(i)
|
|
sum += per_cpu(nr_unused, i);
|
|
return sum < 0 ? 0 : sum;
|
|
}
|
|
|
|
long get_nr_dirty_inodes(void)
|
|
{
|
|
/* not actually dirty inodes, but a wild approximation */
|
|
long nr_dirty = get_nr_inodes() - get_nr_inodes_unused();
|
|
return nr_dirty > 0 ? nr_dirty : 0;
|
|
}
|
|
|
|
/*
|
|
* Handle nr_inode sysctl
|
|
*/
|
|
#ifdef CONFIG_SYSCTL
|
|
int proc_nr_inodes(struct ctl_table *table, int write,
|
|
void __user *buffer, size_t *lenp, loff_t *ppos)
|
|
{
|
|
inodes_stat.nr_inodes = get_nr_inodes();
|
|
inodes_stat.nr_unused = get_nr_inodes_unused();
|
|
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
|
|
}
|
|
#endif
|
|
|
|
static int no_open(struct inode *inode, struct file *file)
|
|
{
|
|
return -ENXIO;
|
|
}
|
|
|
|
/**
|
|
* inode_init_always - perform inode structure intialisation
|
|
* @sb: superblock inode belongs to
|
|
* @inode: inode to initialise
|
|
*
|
|
* These are initializations that need to be done on every inode
|
|
* allocation as the fields are not initialised by slab allocation.
|
|
*/
|
|
int inode_init_always(struct super_block *sb, struct inode *inode)
|
|
{
|
|
static const struct inode_operations empty_iops;
|
|
static const struct file_operations no_open_fops = {.open = no_open};
|
|
struct address_space *const mapping = &inode->i_data;
|
|
|
|
inode->i_sb = sb;
|
|
inode->i_blkbits = sb->s_blocksize_bits;
|
|
inode->i_flags = 0;
|
|
atomic64_set(&inode->i_sequence, 0);
|
|
atomic_set(&inode->i_count, 1);
|
|
inode->i_op = &empty_iops;
|
|
inode->i_fop = &no_open_fops;
|
|
inode->__i_nlink = 1;
|
|
inode->i_opflags = 0;
|
|
i_uid_write(inode, 0);
|
|
i_gid_write(inode, 0);
|
|
atomic_set(&inode->i_writecount, 0);
|
|
inode->i_size = 0;
|
|
inode->i_blocks = 0;
|
|
inode->i_bytes = 0;
|
|
inode->i_generation = 0;
|
|
inode->i_pipe = NULL;
|
|
inode->i_bdev = NULL;
|
|
inode->i_cdev = NULL;
|
|
inode->i_link = NULL;
|
|
inode->i_rdev = 0;
|
|
inode->dirtied_when = 0;
|
|
|
|
#ifdef CONFIG_CGROUP_WRITEBACK
|
|
inode->i_wb_frn_winner = 0;
|
|
inode->i_wb_frn_avg_time = 0;
|
|
inode->i_wb_frn_history = 0;
|
|
#endif
|
|
|
|
if (security_inode_alloc(inode))
|
|
goto out;
|
|
spin_lock_init(&inode->i_lock);
|
|
lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
|
|
|
|
mutex_init(&inode->i_mutex);
|
|
lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
|
|
|
|
atomic_set(&inode->i_dio_count, 0);
|
|
|
|
mapping->a_ops = &empty_aops;
|
|
mapping->host = inode;
|
|
mapping->flags = 0;
|
|
atomic_set(&mapping->i_mmap_writable, 0);
|
|
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
|
|
mapping->private_data = NULL;
|
|
mapping->writeback_index = 0;
|
|
inode->i_private = NULL;
|
|
inode->i_mapping = mapping;
|
|
INIT_HLIST_HEAD(&inode->i_dentry); /* buggered by rcu freeing */
|
|
#ifdef CONFIG_FS_POSIX_ACL
|
|
inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
|
|
#endif
|
|
|
|
#ifdef CONFIG_FSNOTIFY
|
|
inode->i_fsnotify_mask = 0;
|
|
#endif
|
|
inode->i_flctx = NULL;
|
|
this_cpu_inc(nr_inodes);
|
|
|
|
return 0;
|
|
out:
|
|
return -ENOMEM;
|
|
}
|
|
EXPORT_SYMBOL(inode_init_always);
|
|
|
|
static struct inode *alloc_inode(struct super_block *sb)
|
|
{
|
|
struct inode *inode;
|
|
|
|
if (sb->s_op->alloc_inode)
|
|
inode = sb->s_op->alloc_inode(sb);
|
|
else
|
|
inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
|
|
|
|
if (!inode)
|
|
return NULL;
|
|
|
|
if (unlikely(inode_init_always(sb, inode))) {
|
|
if (inode->i_sb->s_op->destroy_inode)
|
|
inode->i_sb->s_op->destroy_inode(inode);
|
|
else
|
|
kmem_cache_free(inode_cachep, inode);
|
|
return NULL;
|
|
}
|
|
|
|
return inode;
|
|
}
|
|
|
|
void free_inode_nonrcu(struct inode *inode)
|
|
{
|
|
kmem_cache_free(inode_cachep, inode);
|
|
}
|
|
EXPORT_SYMBOL(free_inode_nonrcu);
|
|
|
|
void __destroy_inode(struct inode *inode)
|
|
{
|
|
BUG_ON(inode_has_buffers(inode));
|
|
inode_detach_wb(inode);
|
|
security_inode_free(inode);
|
|
fsnotify_inode_delete(inode);
|
|
locks_free_lock_context(inode->i_flctx);
|
|
if (!inode->i_nlink) {
|
|
WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
|
|
atomic_long_dec(&inode->i_sb->s_remove_count);
|
|
}
|
|
|
|
#ifdef CONFIG_FS_POSIX_ACL
|
|
if (inode->i_acl && inode->i_acl != ACL_NOT_CACHED)
|
|
posix_acl_release(inode->i_acl);
|
|
if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
|
|
posix_acl_release(inode->i_default_acl);
|
|
#endif
|
|
this_cpu_dec(nr_inodes);
|
|
}
|
|
EXPORT_SYMBOL(__destroy_inode);
|
|
|
|
static void i_callback(struct rcu_head *head)
|
|
{
|
|
struct inode *inode = container_of(head, struct inode, i_rcu);
|
|
kmem_cache_free(inode_cachep, inode);
|
|
}
|
|
|
|
static void destroy_inode(struct inode *inode)
|
|
{
|
|
BUG_ON(!list_empty(&inode->i_lru));
|
|
__destroy_inode(inode);
|
|
if (inode->i_sb->s_op->destroy_inode)
|
|
inode->i_sb->s_op->destroy_inode(inode);
|
|
else
|
|
call_rcu(&inode->i_rcu, i_callback);
|
|
}
|
|
|
|
/**
|
|
* drop_nlink - directly drop an inode's link count
|
|
* @inode: inode
|
|
*
|
|
* This is a low-level filesystem helper to replace any
|
|
* direct filesystem manipulation of i_nlink. In cases
|
|
* where we are attempting to track writes to the
|
|
* filesystem, a decrement to zero means an imminent
|
|
* write when the file is truncated and actually unlinked
|
|
* on the filesystem.
|
|
*/
|
|
void drop_nlink(struct inode *inode)
|
|
{
|
|
WARN_ON(inode->i_nlink == 0);
|
|
inode->__i_nlink--;
|
|
if (!inode->i_nlink)
|
|
atomic_long_inc(&inode->i_sb->s_remove_count);
|
|
}
|
|
EXPORT_SYMBOL(drop_nlink);
|
|
|
|
/**
|
|
* clear_nlink - directly zero an inode's link count
|
|
* @inode: inode
|
|
*
|
|
* This is a low-level filesystem helper to replace any
|
|
* direct filesystem manipulation of i_nlink. See
|
|
* drop_nlink() for why we care about i_nlink hitting zero.
|
|
*/
|
|
void clear_nlink(struct inode *inode)
|
|
{
|
|
if (inode->i_nlink) {
|
|
inode->__i_nlink = 0;
|
|
atomic_long_inc(&inode->i_sb->s_remove_count);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(clear_nlink);
|
|
|
|
/**
|
|
* set_nlink - directly set an inode's link count
|
|
* @inode: inode
|
|
* @nlink: new nlink (should be non-zero)
|
|
*
|
|
* This is a low-level filesystem helper to replace any
|
|
* direct filesystem manipulation of i_nlink.
|
|
*/
|
|
void set_nlink(struct inode *inode, unsigned int nlink)
|
|
{
|
|
if (!nlink) {
|
|
clear_nlink(inode);
|
|
} else {
|
|
/* Yes, some filesystems do change nlink from zero to one */
|
|
if (inode->i_nlink == 0)
|
|
atomic_long_dec(&inode->i_sb->s_remove_count);
|
|
|
|
inode->__i_nlink = nlink;
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(set_nlink);
|
|
|
|
/**
|
|
* inc_nlink - directly increment an inode's link count
|
|
* @inode: inode
|
|
*
|
|
* This is a low-level filesystem helper to replace any
|
|
* direct filesystem manipulation of i_nlink. Currently,
|
|
* it is only here for parity with dec_nlink().
|
|
*/
|
|
void inc_nlink(struct inode *inode)
|
|
{
|
|
if (unlikely(inode->i_nlink == 0)) {
|
|
WARN_ON(!(inode->i_state & I_LINKABLE));
|
|
atomic_long_dec(&inode->i_sb->s_remove_count);
|
|
}
|
|
|
|
inode->__i_nlink++;
|
|
}
|
|
EXPORT_SYMBOL(inc_nlink);
|
|
|
|
void address_space_init_once(struct address_space *mapping)
|
|
{
|
|
memset(mapping, 0, sizeof(*mapping));
|
|
INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
|
|
spin_lock_init(&mapping->tree_lock);
|
|
init_rwsem(&mapping->i_mmap_rwsem);
|
|
INIT_LIST_HEAD(&mapping->private_list);
|
|
spin_lock_init(&mapping->private_lock);
|
|
mapping->i_mmap = RB_ROOT;
|
|
}
|
|
EXPORT_SYMBOL(address_space_init_once);
|
|
|
|
/*
|
|
* These are initializations that only need to be done
|
|
* once, because the fields are idempotent across use
|
|
* of the inode, so let the slab aware of that.
|
|
*/
|
|
void inode_init_once(struct inode *inode)
|
|
{
|
|
memset(inode, 0, sizeof(*inode));
|
|
INIT_HLIST_NODE(&inode->i_hash);
|
|
INIT_LIST_HEAD(&inode->i_devices);
|
|
INIT_LIST_HEAD(&inode->i_io_list);
|
|
INIT_LIST_HEAD(&inode->i_lru);
|
|
address_space_init_once(&inode->i_data);
|
|
i_size_ordered_init(inode);
|
|
#ifdef CONFIG_FSNOTIFY
|
|
INIT_HLIST_HEAD(&inode->i_fsnotify_marks);
|
|
#endif
|
|
}
|
|
EXPORT_SYMBOL(inode_init_once);
|
|
|
|
static void init_once(void *foo)
|
|
{
|
|
struct inode *inode = (struct inode *) foo;
|
|
|
|
inode_init_once(inode);
|
|
}
|
|
|
|
/*
|
|
* inode->i_lock must be held
|
|
*/
|
|
void __iget(struct inode *inode)
|
|
{
|
|
atomic_inc(&inode->i_count);
|
|
}
|
|
|
|
/*
|
|
* get additional reference to inode; caller must already hold one.
|
|
*/
|
|
void ihold(struct inode *inode)
|
|
{
|
|
WARN_ON(atomic_inc_return(&inode->i_count) < 2);
|
|
}
|
|
EXPORT_SYMBOL(ihold);
|
|
|
|
static void inode_lru_list_add(struct inode *inode)
|
|
{
|
|
if (list_lru_add(&inode->i_sb->s_inode_lru, &inode->i_lru))
|
|
this_cpu_inc(nr_unused);
|
|
}
|
|
|
|
/*
|
|
* Add inode to LRU if needed (inode is unused and clean).
|
|
*
|
|
* Needs inode->i_lock held.
|
|
*/
|
|
void inode_add_lru(struct inode *inode)
|
|
{
|
|
if (!(inode->i_state & (I_DIRTY_ALL | I_SYNC |
|
|
I_FREEING | I_WILL_FREE)) &&
|
|
!atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
|
|
inode_lru_list_add(inode);
|
|
}
|
|
|
|
|
|
static void inode_lru_list_del(struct inode *inode)
|
|
{
|
|
|
|
if (list_lru_del(&inode->i_sb->s_inode_lru, &inode->i_lru))
|
|
this_cpu_dec(nr_unused);
|
|
}
|
|
|
|
/**
|
|
* inode_sb_list_add - add inode to the superblock list of inodes
|
|
* @inode: inode to add
|
|
*/
|
|
void inode_sb_list_add(struct inode *inode)
|
|
{
|
|
spin_lock(&inode->i_sb->s_inode_list_lock);
|
|
list_add(&inode->i_sb_list, &inode->i_sb->s_inodes);
|
|
spin_unlock(&inode->i_sb->s_inode_list_lock);
|
|
}
|
|
EXPORT_SYMBOL_GPL(inode_sb_list_add);
|
|
|
|
static inline void inode_sb_list_del(struct inode *inode)
|
|
{
|
|
if (!list_empty(&inode->i_sb_list)) {
|
|
spin_lock(&inode->i_sb->s_inode_list_lock);
|
|
list_del_init(&inode->i_sb_list);
|
|
spin_unlock(&inode->i_sb->s_inode_list_lock);
|
|
}
|
|
}
|
|
|
|
static unsigned long hash(struct super_block *sb, unsigned long hashval)
|
|
{
|
|
unsigned long tmp;
|
|
|
|
tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
|
|
L1_CACHE_BYTES;
|
|
tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift);
|
|
return tmp & i_hash_mask;
|
|
}
|
|
|
|
/**
|
|
* __insert_inode_hash - hash an inode
|
|
* @inode: unhashed inode
|
|
* @hashval: unsigned long value used to locate this object in the
|
|
* inode_hashtable.
|
|
*
|
|
* Add an inode to the inode hash for this superblock.
|
|
*/
|
|
void __insert_inode_hash(struct inode *inode, unsigned long hashval)
|
|
{
|
|
struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval);
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
spin_lock(&inode->i_lock);
|
|
hlist_add_head(&inode->i_hash, b);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
}
|
|
EXPORT_SYMBOL(__insert_inode_hash);
|
|
|
|
/**
|
|
* __remove_inode_hash - remove an inode from the hash
|
|
* @inode: inode to unhash
|
|
*
|
|
* Remove an inode from the superblock.
|
|
*/
|
|
void __remove_inode_hash(struct inode *inode)
|
|
{
|
|
spin_lock(&inode_hash_lock);
|
|
spin_lock(&inode->i_lock);
|
|
hlist_del_init(&inode->i_hash);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
}
|
|
EXPORT_SYMBOL(__remove_inode_hash);
|
|
|
|
void clear_inode(struct inode *inode)
|
|
{
|
|
might_sleep();
|
|
/*
|
|
* We have to cycle tree_lock here because reclaim can be still in the
|
|
* process of removing the last page (in __delete_from_page_cache())
|
|
* and we must not free mapping under it.
|
|
*/
|
|
spin_lock_irq(&inode->i_data.tree_lock);
|
|
BUG_ON(inode->i_data.nrpages);
|
|
BUG_ON(inode->i_data.nrshadows);
|
|
spin_unlock_irq(&inode->i_data.tree_lock);
|
|
BUG_ON(!list_empty(&inode->i_data.private_list));
|
|
BUG_ON(!(inode->i_state & I_FREEING));
|
|
BUG_ON(inode->i_state & I_CLEAR);
|
|
/* don't need i_lock here, no concurrent mods to i_state */
|
|
inode->i_state = I_FREEING | I_CLEAR;
|
|
}
|
|
EXPORT_SYMBOL(clear_inode);
|
|
|
|
/*
|
|
* Free the inode passed in, removing it from the lists it is still connected
|
|
* to. We remove any pages still attached to the inode and wait for any IO that
|
|
* is still in progress before finally destroying the inode.
|
|
*
|
|
* An inode must already be marked I_FREEING so that we avoid the inode being
|
|
* moved back onto lists if we race with other code that manipulates the lists
|
|
* (e.g. writeback_single_inode). The caller is responsible for setting this.
|
|
*
|
|
* An inode must already be removed from the LRU list before being evicted from
|
|
* the cache. This should occur atomically with setting the I_FREEING state
|
|
* flag, so no inodes here should ever be on the LRU when being evicted.
|
|
*/
|
|
static void evict(struct inode *inode)
|
|
{
|
|
const struct super_operations *op = inode->i_sb->s_op;
|
|
|
|
BUG_ON(!(inode->i_state & I_FREEING));
|
|
BUG_ON(!list_empty(&inode->i_lru));
|
|
|
|
if (!list_empty(&inode->i_io_list))
|
|
inode_io_list_del(inode);
|
|
|
|
inode_sb_list_del(inode);
|
|
|
|
/*
|
|
* Wait for flusher thread to be done with the inode so that filesystem
|
|
* does not start destroying it while writeback is still running. Since
|
|
* the inode has I_FREEING set, flusher thread won't start new work on
|
|
* the inode. We just have to wait for running writeback to finish.
|
|
*/
|
|
inode_wait_for_writeback(inode);
|
|
|
|
if (op->evict_inode) {
|
|
op->evict_inode(inode);
|
|
} else {
|
|
truncate_inode_pages_final(&inode->i_data);
|
|
clear_inode(inode);
|
|
}
|
|
if (S_ISBLK(inode->i_mode) && inode->i_bdev)
|
|
bd_forget(inode);
|
|
if (S_ISCHR(inode->i_mode) && inode->i_cdev)
|
|
cd_forget(inode);
|
|
|
|
remove_inode_hash(inode);
|
|
|
|
spin_lock(&inode->i_lock);
|
|
wake_up_bit(&inode->i_state, __I_NEW);
|
|
BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
|
|
spin_unlock(&inode->i_lock);
|
|
|
|
destroy_inode(inode);
|
|
}
|
|
|
|
/*
|
|
* dispose_list - dispose of the contents of a local list
|
|
* @head: the head of the list to free
|
|
*
|
|
* Dispose-list gets a local list with local inodes in it, so it doesn't
|
|
* need to worry about list corruption and SMP locks.
|
|
*/
|
|
static void dispose_list(struct list_head *head)
|
|
{
|
|
while (!list_empty(head)) {
|
|
struct inode *inode;
|
|
|
|
inode = list_first_entry(head, struct inode, i_lru);
|
|
list_del_init(&inode->i_lru);
|
|
|
|
evict(inode);
|
|
cond_resched();
|
|
}
|
|
}
|
|
|
|
/**
|
|
* evict_inodes - evict all evictable inodes for a superblock
|
|
* @sb: superblock to operate on
|
|
*
|
|
* Make sure that no inodes with zero refcount are retained. This is
|
|
* called by superblock shutdown after having MS_ACTIVE flag removed,
|
|
* so any inode reaching zero refcount during or after that call will
|
|
* be immediately evicted.
|
|
*/
|
|
void evict_inodes(struct super_block *sb)
|
|
{
|
|
struct inode *inode, *next;
|
|
LIST_HEAD(dispose);
|
|
|
|
again:
|
|
spin_lock(&sb->s_inode_list_lock);
|
|
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
|
|
if (atomic_read(&inode->i_count))
|
|
continue;
|
|
|
|
spin_lock(&inode->i_lock);
|
|
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
|
|
spin_unlock(&inode->i_lock);
|
|
continue;
|
|
}
|
|
|
|
inode->i_state |= I_FREEING;
|
|
inode_lru_list_del(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
list_add(&inode->i_lru, &dispose);
|
|
|
|
/*
|
|
* We can have a ton of inodes to evict at unmount time given
|
|
* enough memory, check to see if we need to go to sleep for a
|
|
* bit so we don't livelock.
|
|
*/
|
|
if (need_resched()) {
|
|
spin_unlock(&sb->s_inode_list_lock);
|
|
cond_resched();
|
|
dispose_list(&dispose);
|
|
goto again;
|
|
}
|
|
}
|
|
spin_unlock(&sb->s_inode_list_lock);
|
|
|
|
dispose_list(&dispose);
|
|
}
|
|
EXPORT_SYMBOL_GPL(evict_inodes);
|
|
|
|
/**
|
|
* invalidate_inodes - attempt to free all inodes on a superblock
|
|
* @sb: superblock to operate on
|
|
* @kill_dirty: flag to guide handling of dirty inodes
|
|
*
|
|
* Attempts to free all inodes for a given superblock. If there were any
|
|
* busy inodes return a non-zero value, else zero.
|
|
* If @kill_dirty is set, discard dirty inodes too, otherwise treat
|
|
* them as busy.
|
|
*/
|
|
int invalidate_inodes(struct super_block *sb, bool kill_dirty)
|
|
{
|
|
int busy = 0;
|
|
struct inode *inode, *next;
|
|
LIST_HEAD(dispose);
|
|
|
|
spin_lock(&sb->s_inode_list_lock);
|
|
list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
|
|
spin_lock(&inode->i_lock);
|
|
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
|
|
spin_unlock(&inode->i_lock);
|
|
continue;
|
|
}
|
|
if (inode->i_state & I_DIRTY_ALL && !kill_dirty) {
|
|
spin_unlock(&inode->i_lock);
|
|
busy = 1;
|
|
continue;
|
|
}
|
|
if (atomic_read(&inode->i_count)) {
|
|
spin_unlock(&inode->i_lock);
|
|
busy = 1;
|
|
continue;
|
|
}
|
|
|
|
inode->i_state |= I_FREEING;
|
|
inode_lru_list_del(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
list_add(&inode->i_lru, &dispose);
|
|
}
|
|
spin_unlock(&sb->s_inode_list_lock);
|
|
|
|
dispose_list(&dispose);
|
|
|
|
return busy;
|
|
}
|
|
|
|
/*
|
|
* Isolate the inode from the LRU in preparation for freeing it.
|
|
*
|
|
* Any inodes which are pinned purely because of attached pagecache have their
|
|
* pagecache removed. If the inode has metadata buffers attached to
|
|
* mapping->private_list then try to remove them.
|
|
*
|
|
* If the inode has the I_REFERENCED flag set, then it means that it has been
|
|
* used recently - the flag is set in iput_final(). When we encounter such an
|
|
* inode, clear the flag and move it to the back of the LRU so it gets another
|
|
* pass through the LRU before it gets reclaimed. This is necessary because of
|
|
* the fact we are doing lazy LRU updates to minimise lock contention so the
|
|
* LRU does not have strict ordering. Hence we don't want to reclaim inodes
|
|
* with this flag set because they are the inodes that are out of order.
|
|
*/
|
|
static enum lru_status inode_lru_isolate(struct list_head *item,
|
|
struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
|
|
{
|
|
struct list_head *freeable = arg;
|
|
struct inode *inode = container_of(item, struct inode, i_lru);
|
|
|
|
/*
|
|
* we are inverting the lru lock/inode->i_lock here, so use a trylock.
|
|
* If we fail to get the lock, just skip it.
|
|
*/
|
|
if (!spin_trylock(&inode->i_lock))
|
|
return LRU_SKIP;
|
|
|
|
/*
|
|
* Referenced or dirty inodes are still in use. Give them another pass
|
|
* through the LRU as we canot reclaim them now.
|
|
*/
|
|
if (atomic_read(&inode->i_count) ||
|
|
(inode->i_state & ~I_REFERENCED)) {
|
|
list_lru_isolate(lru, &inode->i_lru);
|
|
spin_unlock(&inode->i_lock);
|
|
this_cpu_dec(nr_unused);
|
|
return LRU_REMOVED;
|
|
}
|
|
|
|
/* recently referenced inodes get one more pass */
|
|
if (inode->i_state & I_REFERENCED) {
|
|
inode->i_state &= ~I_REFERENCED;
|
|
spin_unlock(&inode->i_lock);
|
|
return LRU_ROTATE;
|
|
}
|
|
|
|
if (inode_has_buffers(inode) || inode->i_data.nrpages) {
|
|
__iget(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(lru_lock);
|
|
if (remove_inode_buffers(inode)) {
|
|
unsigned long reap;
|
|
reap = invalidate_mapping_pages(&inode->i_data, 0, -1);
|
|
if (current_is_kswapd())
|
|
__count_vm_events(KSWAPD_INODESTEAL, reap);
|
|
else
|
|
__count_vm_events(PGINODESTEAL, reap);
|
|
if (current->reclaim_state)
|
|
current->reclaim_state->reclaimed_slab += reap;
|
|
}
|
|
iput(inode);
|
|
spin_lock(lru_lock);
|
|
return LRU_RETRY;
|
|
}
|
|
|
|
WARN_ON(inode->i_state & I_NEW);
|
|
inode->i_state |= I_FREEING;
|
|
list_lru_isolate_move(lru, &inode->i_lru, freeable);
|
|
spin_unlock(&inode->i_lock);
|
|
|
|
this_cpu_dec(nr_unused);
|
|
return LRU_REMOVED;
|
|
}
|
|
|
|
/*
|
|
* Walk the superblock inode LRU for freeable inodes and attempt to free them.
|
|
* This is called from the superblock shrinker function with a number of inodes
|
|
* to trim from the LRU. Inodes to be freed are moved to a temporary list and
|
|
* then are freed outside inode_lock by dispose_list().
|
|
*/
|
|
long prune_icache_sb(struct super_block *sb, struct shrink_control *sc)
|
|
{
|
|
LIST_HEAD(freeable);
|
|
long freed;
|
|
|
|
freed = list_lru_shrink_walk(&sb->s_inode_lru, sc,
|
|
inode_lru_isolate, &freeable);
|
|
dispose_list(&freeable);
|
|
return freed;
|
|
}
|
|
|
|
static void __wait_on_freeing_inode(struct inode *inode);
|
|
/*
|
|
* Called with the inode lock held.
|
|
*/
|
|
static struct inode *find_inode(struct super_block *sb,
|
|
struct hlist_head *head,
|
|
int (*test)(struct inode *, void *),
|
|
void *data)
|
|
{
|
|
struct inode *inode = NULL;
|
|
|
|
repeat:
|
|
hlist_for_each_entry(inode, head, i_hash) {
|
|
if (inode->i_sb != sb)
|
|
continue;
|
|
if (!test(inode, data))
|
|
continue;
|
|
spin_lock(&inode->i_lock);
|
|
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
|
|
__wait_on_freeing_inode(inode);
|
|
goto repeat;
|
|
}
|
|
__iget(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
return inode;
|
|
}
|
|
return NULL;
|
|
}
|
|
|
|
/*
|
|
* find_inode_fast is the fast path version of find_inode, see the comment at
|
|
* iget_locked for details.
|
|
*/
|
|
static struct inode *find_inode_fast(struct super_block *sb,
|
|
struct hlist_head *head, unsigned long ino)
|
|
{
|
|
struct inode *inode = NULL;
|
|
|
|
repeat:
|
|
hlist_for_each_entry(inode, head, i_hash) {
|
|
if (inode->i_ino != ino)
|
|
continue;
|
|
if (inode->i_sb != sb)
|
|
continue;
|
|
spin_lock(&inode->i_lock);
|
|
if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
|
|
__wait_on_freeing_inode(inode);
|
|
goto repeat;
|
|
}
|
|
__iget(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
return inode;
|
|
}
|
|
return NULL;
|
|
}
|
|
|
|
/*
|
|
* Each cpu owns a range of LAST_INO_BATCH numbers.
|
|
* 'shared_last_ino' is dirtied only once out of LAST_INO_BATCH allocations,
|
|
* to renew the exhausted range.
|
|
*
|
|
* This does not significantly increase overflow rate because every CPU can
|
|
* consume at most LAST_INO_BATCH-1 unused inode numbers. So there is
|
|
* NR_CPUS*(LAST_INO_BATCH-1) wastage. At 4096 and 1024, this is ~0.1% of the
|
|
* 2^32 range, and is a worst-case. Even a 50% wastage would only increase
|
|
* overflow rate by 2x, which does not seem too significant.
|
|
*
|
|
* On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
|
|
* error if st_ino won't fit in target struct field. Use 32bit counter
|
|
* here to attempt to avoid that.
|
|
*/
|
|
#define LAST_INO_BATCH 1024
|
|
static DEFINE_PER_CPU(unsigned int, last_ino);
|
|
|
|
unsigned int get_next_ino(void)
|
|
{
|
|
unsigned int *p = &get_cpu_var(last_ino);
|
|
unsigned int res = *p;
|
|
|
|
#ifdef CONFIG_SMP
|
|
if (unlikely((res & (LAST_INO_BATCH-1)) == 0)) {
|
|
static atomic_t shared_last_ino;
|
|
int next = atomic_add_return(LAST_INO_BATCH, &shared_last_ino);
|
|
|
|
res = next - LAST_INO_BATCH;
|
|
}
|
|
#endif
|
|
|
|
res++;
|
|
/* get_next_ino should not provide a 0 inode number */
|
|
if (unlikely(!res))
|
|
res++;
|
|
*p = res;
|
|
put_cpu_var(last_ino);
|
|
return res;
|
|
}
|
|
EXPORT_SYMBOL(get_next_ino);
|
|
|
|
/**
|
|
* new_inode_pseudo - obtain an inode
|
|
* @sb: superblock
|
|
*
|
|
* Allocates a new inode for given superblock.
|
|
* Inode wont be chained in superblock s_inodes list
|
|
* This means :
|
|
* - fs can't be unmount
|
|
* - quotas, fsnotify, writeback can't work
|
|
*/
|
|
struct inode *new_inode_pseudo(struct super_block *sb)
|
|
{
|
|
struct inode *inode = alloc_inode(sb);
|
|
|
|
if (inode) {
|
|
spin_lock(&inode->i_lock);
|
|
inode->i_state = 0;
|
|
spin_unlock(&inode->i_lock);
|
|
INIT_LIST_HEAD(&inode->i_sb_list);
|
|
}
|
|
return inode;
|
|
}
|
|
|
|
/**
|
|
* new_inode - obtain an inode
|
|
* @sb: superblock
|
|
*
|
|
* Allocates a new inode for given superblock. The default gfp_mask
|
|
* for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.
|
|
* If HIGHMEM pages are unsuitable or it is known that pages allocated
|
|
* for the page cache are not reclaimable or migratable,
|
|
* mapping_set_gfp_mask() must be called with suitable flags on the
|
|
* newly created inode's mapping
|
|
*
|
|
*/
|
|
struct inode *new_inode(struct super_block *sb)
|
|
{
|
|
struct inode *inode;
|
|
|
|
spin_lock_prefetch(&sb->s_inode_list_lock);
|
|
|
|
inode = new_inode_pseudo(sb);
|
|
if (inode)
|
|
inode_sb_list_add(inode);
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(new_inode);
|
|
|
|
#ifdef CONFIG_DEBUG_LOCK_ALLOC
|
|
void lockdep_annotate_inode_mutex_key(struct inode *inode)
|
|
{
|
|
if (S_ISDIR(inode->i_mode)) {
|
|
struct file_system_type *type = inode->i_sb->s_type;
|
|
|
|
/* Set new key only if filesystem hasn't already changed it */
|
|
if (lockdep_match_class(&inode->i_mutex, &type->i_mutex_key)) {
|
|
/*
|
|
* ensure nobody is actually holding i_mutex
|
|
*/
|
|
mutex_destroy(&inode->i_mutex);
|
|
mutex_init(&inode->i_mutex);
|
|
lockdep_set_class(&inode->i_mutex,
|
|
&type->i_mutex_dir_key);
|
|
}
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(lockdep_annotate_inode_mutex_key);
|
|
#endif
|
|
|
|
/**
|
|
* unlock_new_inode - clear the I_NEW state and wake up any waiters
|
|
* @inode: new inode to unlock
|
|
*
|
|
* Called when the inode is fully initialised to clear the new state of the
|
|
* inode and wake up anyone waiting for the inode to finish initialisation.
|
|
*/
|
|
void unlock_new_inode(struct inode *inode)
|
|
{
|
|
lockdep_annotate_inode_mutex_key(inode);
|
|
spin_lock(&inode->i_lock);
|
|
WARN_ON(!(inode->i_state & I_NEW));
|
|
inode->i_state &= ~I_NEW;
|
|
smp_mb();
|
|
wake_up_bit(&inode->i_state, __I_NEW);
|
|
spin_unlock(&inode->i_lock);
|
|
}
|
|
EXPORT_SYMBOL(unlock_new_inode);
|
|
|
|
/**
|
|
* lock_two_nondirectories - take two i_mutexes on non-directory objects
|
|
*
|
|
* Lock any non-NULL argument that is not a directory.
|
|
* Zero, one or two objects may be locked by this function.
|
|
*
|
|
* @inode1: first inode to lock
|
|
* @inode2: second inode to lock
|
|
*/
|
|
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
|
|
{
|
|
if (inode1 > inode2)
|
|
swap(inode1, inode2);
|
|
|
|
if (inode1 && !S_ISDIR(inode1->i_mode))
|
|
mutex_lock(&inode1->i_mutex);
|
|
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
|
|
mutex_lock_nested(&inode2->i_mutex, I_MUTEX_NONDIR2);
|
|
}
|
|
EXPORT_SYMBOL(lock_two_nondirectories);
|
|
|
|
/**
|
|
* unlock_two_nondirectories - release locks from lock_two_nondirectories()
|
|
* @inode1: first inode to unlock
|
|
* @inode2: second inode to unlock
|
|
*/
|
|
void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
|
|
{
|
|
if (inode1 && !S_ISDIR(inode1->i_mode))
|
|
mutex_unlock(&inode1->i_mutex);
|
|
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
|
|
mutex_unlock(&inode2->i_mutex);
|
|
}
|
|
EXPORT_SYMBOL(unlock_two_nondirectories);
|
|
|
|
/**
|
|
* iget5_locked - obtain an inode from a mounted file system
|
|
* @sb: super block of file system
|
|
* @hashval: hash value (usually inode number) to get
|
|
* @test: callback used for comparisons between inodes
|
|
* @set: callback used to initialize a new struct inode
|
|
* @data: opaque data pointer to pass to @test and @set
|
|
*
|
|
* Search for the inode specified by @hashval and @data in the inode cache,
|
|
* and if present it is return it with an increased reference count. This is
|
|
* a generalized version of iget_locked() for file systems where the inode
|
|
* number is not sufficient for unique identification of an inode.
|
|
*
|
|
* If the inode is not in cache, allocate a new inode and return it locked,
|
|
* hashed, and with the I_NEW flag set. The file system gets to fill it in
|
|
* before unlocking it via unlock_new_inode().
|
|
*
|
|
* Note both @test and @set are called with the inode_hash_lock held, so can't
|
|
* sleep.
|
|
*/
|
|
struct inode *iget5_locked(struct super_block *sb, unsigned long hashval,
|
|
int (*test)(struct inode *, void *),
|
|
int (*set)(struct inode *, void *), void *data)
|
|
{
|
|
struct hlist_head *head = inode_hashtable + hash(sb, hashval);
|
|
struct inode *inode;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
inode = find_inode(sb, head, test, data);
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
if (inode) {
|
|
wait_on_inode(inode);
|
|
return inode;
|
|
}
|
|
|
|
inode = alloc_inode(sb);
|
|
if (inode) {
|
|
struct inode *old;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
/* We released the lock, so.. */
|
|
old = find_inode(sb, head, test, data);
|
|
if (!old) {
|
|
if (set(inode, data))
|
|
goto set_failed;
|
|
|
|
spin_lock(&inode->i_lock);
|
|
inode->i_state = I_NEW;
|
|
hlist_add_head(&inode->i_hash, head);
|
|
spin_unlock(&inode->i_lock);
|
|
inode_sb_list_add(inode);
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
/* Return the locked inode with I_NEW set, the
|
|
* caller is responsible for filling in the contents
|
|
*/
|
|
return inode;
|
|
}
|
|
|
|
/*
|
|
* Uhhuh, somebody else created the same inode under
|
|
* us. Use the old inode instead of the one we just
|
|
* allocated.
|
|
*/
|
|
spin_unlock(&inode_hash_lock);
|
|
destroy_inode(inode);
|
|
inode = old;
|
|
wait_on_inode(inode);
|
|
}
|
|
return inode;
|
|
|
|
set_failed:
|
|
spin_unlock(&inode_hash_lock);
|
|
destroy_inode(inode);
|
|
return NULL;
|
|
}
|
|
EXPORT_SYMBOL(iget5_locked);
|
|
|
|
/**
|
|
* iget_locked - obtain an inode from a mounted file system
|
|
* @sb: super block of file system
|
|
* @ino: inode number to get
|
|
*
|
|
* Search for the inode specified by @ino in the inode cache and if present
|
|
* return it with an increased reference count. This is for file systems
|
|
* where the inode number is sufficient for unique identification of an inode.
|
|
*
|
|
* If the inode is not in cache, allocate a new inode and return it locked,
|
|
* hashed, and with the I_NEW flag set. The file system gets to fill it in
|
|
* before unlocking it via unlock_new_inode().
|
|
*/
|
|
struct inode *iget_locked(struct super_block *sb, unsigned long ino)
|
|
{
|
|
struct hlist_head *head = inode_hashtable + hash(sb, ino);
|
|
struct inode *inode;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
inode = find_inode_fast(sb, head, ino);
|
|
spin_unlock(&inode_hash_lock);
|
|
if (inode) {
|
|
wait_on_inode(inode);
|
|
return inode;
|
|
}
|
|
|
|
inode = alloc_inode(sb);
|
|
if (inode) {
|
|
struct inode *old;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
/* We released the lock, so.. */
|
|
old = find_inode_fast(sb, head, ino);
|
|
if (!old) {
|
|
inode->i_ino = ino;
|
|
spin_lock(&inode->i_lock);
|
|
inode->i_state = I_NEW;
|
|
hlist_add_head(&inode->i_hash, head);
|
|
spin_unlock(&inode->i_lock);
|
|
inode_sb_list_add(inode);
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
/* Return the locked inode with I_NEW set, the
|
|
* caller is responsible for filling in the contents
|
|
*/
|
|
return inode;
|
|
}
|
|
|
|
/*
|
|
* Uhhuh, somebody else created the same inode under
|
|
* us. Use the old inode instead of the one we just
|
|
* allocated.
|
|
*/
|
|
spin_unlock(&inode_hash_lock);
|
|
destroy_inode(inode);
|
|
inode = old;
|
|
wait_on_inode(inode);
|
|
}
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(iget_locked);
|
|
|
|
/*
|
|
* search the inode cache for a matching inode number.
|
|
* If we find one, then the inode number we are trying to
|
|
* allocate is not unique and so we should not use it.
|
|
*
|
|
* Returns 1 if the inode number is unique, 0 if it is not.
|
|
*/
|
|
static int test_inode_iunique(struct super_block *sb, unsigned long ino)
|
|
{
|
|
struct hlist_head *b = inode_hashtable + hash(sb, ino);
|
|
struct inode *inode;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
hlist_for_each_entry(inode, b, i_hash) {
|
|
if (inode->i_ino == ino && inode->i_sb == sb) {
|
|
spin_unlock(&inode_hash_lock);
|
|
return 0;
|
|
}
|
|
}
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
return 1;
|
|
}
|
|
|
|
/**
|
|
* iunique - get a unique inode number
|
|
* @sb: superblock
|
|
* @max_reserved: highest reserved inode number
|
|
*
|
|
* Obtain an inode number that is unique on the system for a given
|
|
* superblock. This is used by file systems that have no natural
|
|
* permanent inode numbering system. An inode number is returned that
|
|
* is higher than the reserved limit but unique.
|
|
*
|
|
* BUGS:
|
|
* With a large number of inodes live on the file system this function
|
|
* currently becomes quite slow.
|
|
*/
|
|
ino_t iunique(struct super_block *sb, ino_t max_reserved)
|
|
{
|
|
/*
|
|
* On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
|
|
* error if st_ino won't fit in target struct field. Use 32bit counter
|
|
* here to attempt to avoid that.
|
|
*/
|
|
static DEFINE_SPINLOCK(iunique_lock);
|
|
static unsigned int counter;
|
|
ino_t res;
|
|
|
|
spin_lock(&iunique_lock);
|
|
do {
|
|
if (counter <= max_reserved)
|
|
counter = max_reserved + 1;
|
|
res = counter++;
|
|
} while (!test_inode_iunique(sb, res));
|
|
spin_unlock(&iunique_lock);
|
|
|
|
return res;
|
|
}
|
|
EXPORT_SYMBOL(iunique);
|
|
|
|
struct inode *igrab(struct inode *inode)
|
|
{
|
|
spin_lock(&inode->i_lock);
|
|
if (!(inode->i_state & (I_FREEING|I_WILL_FREE))) {
|
|
__iget(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
} else {
|
|
spin_unlock(&inode->i_lock);
|
|
/*
|
|
* Handle the case where s_op->clear_inode is not been
|
|
* called yet, and somebody is calling igrab
|
|
* while the inode is getting freed.
|
|
*/
|
|
inode = NULL;
|
|
}
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(igrab);
|
|
|
|
/**
|
|
* ilookup5_nowait - search for an inode in the inode cache
|
|
* @sb: super block of file system to search
|
|
* @hashval: hash value (usually inode number) to search for
|
|
* @test: callback used for comparisons between inodes
|
|
* @data: opaque data pointer to pass to @test
|
|
*
|
|
* Search for the inode specified by @hashval and @data in the inode cache.
|
|
* If the inode is in the cache, the inode is returned with an incremented
|
|
* reference count.
|
|
*
|
|
* Note: I_NEW is not waited upon so you have to be very careful what you do
|
|
* with the returned inode. You probably should be using ilookup5() instead.
|
|
*
|
|
* Note2: @test is called with the inode_hash_lock held, so can't sleep.
|
|
*/
|
|
struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval,
|
|
int (*test)(struct inode *, void *), void *data)
|
|
{
|
|
struct hlist_head *head = inode_hashtable + hash(sb, hashval);
|
|
struct inode *inode;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
inode = find_inode(sb, head, test, data);
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(ilookup5_nowait);
|
|
|
|
/**
|
|
* ilookup5 - search for an inode in the inode cache
|
|
* @sb: super block of file system to search
|
|
* @hashval: hash value (usually inode number) to search for
|
|
* @test: callback used for comparisons between inodes
|
|
* @data: opaque data pointer to pass to @test
|
|
*
|
|
* Search for the inode specified by @hashval and @data in the inode cache,
|
|
* and if the inode is in the cache, return the inode with an incremented
|
|
* reference count. Waits on I_NEW before returning the inode.
|
|
* returned with an incremented reference count.
|
|
*
|
|
* This is a generalized version of ilookup() for file systems where the
|
|
* inode number is not sufficient for unique identification of an inode.
|
|
*
|
|
* Note: @test is called with the inode_hash_lock held, so can't sleep.
|
|
*/
|
|
struct inode *ilookup5(struct super_block *sb, unsigned long hashval,
|
|
int (*test)(struct inode *, void *), void *data)
|
|
{
|
|
struct inode *inode = ilookup5_nowait(sb, hashval, test, data);
|
|
|
|
if (inode)
|
|
wait_on_inode(inode);
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(ilookup5);
|
|
|
|
/**
|
|
* ilookup - search for an inode in the inode cache
|
|
* @sb: super block of file system to search
|
|
* @ino: inode number to search for
|
|
*
|
|
* Search for the inode @ino in the inode cache, and if the inode is in the
|
|
* cache, the inode is returned with an incremented reference count.
|
|
*/
|
|
struct inode *ilookup(struct super_block *sb, unsigned long ino)
|
|
{
|
|
struct hlist_head *head = inode_hashtable + hash(sb, ino);
|
|
struct inode *inode;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
inode = find_inode_fast(sb, head, ino);
|
|
spin_unlock(&inode_hash_lock);
|
|
|
|
if (inode)
|
|
wait_on_inode(inode);
|
|
return inode;
|
|
}
|
|
EXPORT_SYMBOL(ilookup);
|
|
|
|
/**
|
|
* find_inode_nowait - find an inode in the inode cache
|
|
* @sb: super block of file system to search
|
|
* @hashval: hash value (usually inode number) to search for
|
|
* @match: callback used for comparisons between inodes
|
|
* @data: opaque data pointer to pass to @match
|
|
*
|
|
* Search for the inode specified by @hashval and @data in the inode
|
|
* cache, where the helper function @match will return 0 if the inode
|
|
* does not match, 1 if the inode does match, and -1 if the search
|
|
* should be stopped. The @match function must be responsible for
|
|
* taking the i_lock spin_lock and checking i_state for an inode being
|
|
* freed or being initialized, and incrementing the reference count
|
|
* before returning 1. It also must not sleep, since it is called with
|
|
* the inode_hash_lock spinlock held.
|
|
*
|
|
* This is a even more generalized version of ilookup5() when the
|
|
* function must never block --- find_inode() can block in
|
|
* __wait_on_freeing_inode() --- or when the caller can not increment
|
|
* the reference count because the resulting iput() might cause an
|
|
* inode eviction. The tradeoff is that the @match funtion must be
|
|
* very carefully implemented.
|
|
*/
|
|
struct inode *find_inode_nowait(struct super_block *sb,
|
|
unsigned long hashval,
|
|
int (*match)(struct inode *, unsigned long,
|
|
void *),
|
|
void *data)
|
|
{
|
|
struct hlist_head *head = inode_hashtable + hash(sb, hashval);
|
|
struct inode *inode, *ret_inode = NULL;
|
|
int mval;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
hlist_for_each_entry(inode, head, i_hash) {
|
|
if (inode->i_sb != sb)
|
|
continue;
|
|
mval = match(inode, hashval, data);
|
|
if (mval == 0)
|
|
continue;
|
|
if (mval == 1)
|
|
ret_inode = inode;
|
|
goto out;
|
|
}
|
|
out:
|
|
spin_unlock(&inode_hash_lock);
|
|
return ret_inode;
|
|
}
|
|
EXPORT_SYMBOL(find_inode_nowait);
|
|
|
|
int insert_inode_locked(struct inode *inode)
|
|
{
|
|
struct super_block *sb = inode->i_sb;
|
|
ino_t ino = inode->i_ino;
|
|
struct hlist_head *head = inode_hashtable + hash(sb, ino);
|
|
|
|
while (1) {
|
|
struct inode *old = NULL;
|
|
spin_lock(&inode_hash_lock);
|
|
hlist_for_each_entry(old, head, i_hash) {
|
|
if (old->i_ino != ino)
|
|
continue;
|
|
if (old->i_sb != sb)
|
|
continue;
|
|
spin_lock(&old->i_lock);
|
|
if (old->i_state & (I_FREEING|I_WILL_FREE)) {
|
|
spin_unlock(&old->i_lock);
|
|
continue;
|
|
}
|
|
break;
|
|
}
|
|
if (likely(!old)) {
|
|
spin_lock(&inode->i_lock);
|
|
inode->i_state |= I_NEW;
|
|
hlist_add_head(&inode->i_hash, head);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
return 0;
|
|
}
|
|
__iget(old);
|
|
spin_unlock(&old->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
wait_on_inode(old);
|
|
if (unlikely(!inode_unhashed(old))) {
|
|
iput(old);
|
|
return -EBUSY;
|
|
}
|
|
iput(old);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(insert_inode_locked);
|
|
|
|
int insert_inode_locked4(struct inode *inode, unsigned long hashval,
|
|
int (*test)(struct inode *, void *), void *data)
|
|
{
|
|
struct super_block *sb = inode->i_sb;
|
|
struct hlist_head *head = inode_hashtable + hash(sb, hashval);
|
|
|
|
while (1) {
|
|
struct inode *old = NULL;
|
|
|
|
spin_lock(&inode_hash_lock);
|
|
hlist_for_each_entry(old, head, i_hash) {
|
|
if (old->i_sb != sb)
|
|
continue;
|
|
if (!test(old, data))
|
|
continue;
|
|
spin_lock(&old->i_lock);
|
|
if (old->i_state & (I_FREEING|I_WILL_FREE)) {
|
|
spin_unlock(&old->i_lock);
|
|
continue;
|
|
}
|
|
break;
|
|
}
|
|
if (likely(!old)) {
|
|
spin_lock(&inode->i_lock);
|
|
inode->i_state |= I_NEW;
|
|
hlist_add_head(&inode->i_hash, head);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
return 0;
|
|
}
|
|
__iget(old);
|
|
spin_unlock(&old->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
wait_on_inode(old);
|
|
if (unlikely(!inode_unhashed(old))) {
|
|
iput(old);
|
|
return -EBUSY;
|
|
}
|
|
iput(old);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(insert_inode_locked4);
|
|
|
|
|
|
int generic_delete_inode(struct inode *inode)
|
|
{
|
|
return 1;
|
|
}
|
|
EXPORT_SYMBOL(generic_delete_inode);
|
|
|
|
/*
|
|
* Called when we're dropping the last reference
|
|
* to an inode.
|
|
*
|
|
* Call the FS "drop_inode()" function, defaulting to
|
|
* the legacy UNIX filesystem behaviour. If it tells
|
|
* us to evict inode, do so. Otherwise, retain inode
|
|
* in cache if fs is alive, sync and evict if fs is
|
|
* shutting down.
|
|
*/
|
|
static void iput_final(struct inode *inode)
|
|
{
|
|
struct super_block *sb = inode->i_sb;
|
|
const struct super_operations *op = inode->i_sb->s_op;
|
|
int drop;
|
|
|
|
WARN_ON(inode->i_state & I_NEW);
|
|
|
|
if (op->drop_inode)
|
|
drop = op->drop_inode(inode);
|
|
else
|
|
drop = generic_drop_inode(inode);
|
|
|
|
if (!drop && (sb->s_flags & MS_ACTIVE)) {
|
|
inode->i_state |= I_REFERENCED;
|
|
inode_add_lru(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
return;
|
|
}
|
|
|
|
if (!drop) {
|
|
inode->i_state |= I_WILL_FREE;
|
|
spin_unlock(&inode->i_lock);
|
|
write_inode_now(inode, 1);
|
|
spin_lock(&inode->i_lock);
|
|
WARN_ON(inode->i_state & I_NEW);
|
|
inode->i_state &= ~I_WILL_FREE;
|
|
}
|
|
|
|
inode->i_state |= I_FREEING;
|
|
if (!list_empty(&inode->i_lru))
|
|
inode_lru_list_del(inode);
|
|
spin_unlock(&inode->i_lock);
|
|
|
|
evict(inode);
|
|
}
|
|
|
|
/**
|
|
* iput - put an inode
|
|
* @inode: inode to put
|
|
*
|
|
* Puts an inode, dropping its usage count. If the inode use count hits
|
|
* zero, the inode is then freed and may also be destroyed.
|
|
*
|
|
* Consequently, iput() can sleep.
|
|
*/
|
|
void iput(struct inode *inode)
|
|
{
|
|
if (!inode)
|
|
return;
|
|
BUG_ON(inode->i_state & I_CLEAR);
|
|
retry:
|
|
if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock)) {
|
|
if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
|
|
atomic_inc(&inode->i_count);
|
|
inode->i_state &= ~I_DIRTY_TIME;
|
|
spin_unlock(&inode->i_lock);
|
|
trace_writeback_lazytime_iput(inode);
|
|
mark_inode_dirty_sync(inode);
|
|
goto retry;
|
|
}
|
|
iput_final(inode);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(iput);
|
|
|
|
/**
|
|
* bmap - find a block number in a file
|
|
* @inode: inode of file
|
|
* @block: block to find
|
|
*
|
|
* Returns the block number on the device holding the inode that
|
|
* is the disk block number for the block of the file requested.
|
|
* That is, asked for block 4 of inode 1 the function will return the
|
|
* disk block relative to the disk start that holds that block of the
|
|
* file.
|
|
*/
|
|
sector_t bmap(struct inode *inode, sector_t block)
|
|
{
|
|
sector_t res = 0;
|
|
if (inode->i_mapping->a_ops->bmap)
|
|
res = inode->i_mapping->a_ops->bmap(inode->i_mapping, block);
|
|
return res;
|
|
}
|
|
EXPORT_SYMBOL(bmap);
|
|
|
|
/*
|
|
* With relative atime, only update atime if the previous atime is
|
|
* earlier than either the ctime or mtime or if at least a day has
|
|
* passed since the last atime update.
|
|
*/
|
|
static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
|
|
struct timespec now)
|
|
{
|
|
|
|
if (!(mnt->mnt_flags & MNT_RELATIME))
|
|
return 1;
|
|
/*
|
|
* Is mtime younger than atime? If yes, update atime:
|
|
*/
|
|
if (timespec_compare(&inode->i_mtime, &inode->i_atime) >= 0)
|
|
return 1;
|
|
/*
|
|
* Is ctime younger than atime? If yes, update atime:
|
|
*/
|
|
if (timespec_compare(&inode->i_ctime, &inode->i_atime) >= 0)
|
|
return 1;
|
|
|
|
/*
|
|
* Is the previous atime value older than a day? If yes,
|
|
* update atime:
|
|
*/
|
|
if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= 24*60*60)
|
|
return 1;
|
|
/*
|
|
* Good, we can skip the atime update:
|
|
*/
|
|
return 0;
|
|
}
|
|
|
|
int generic_update_time(struct inode *inode, struct timespec *time, int flags)
|
|
{
|
|
int iflags = I_DIRTY_TIME;
|
|
|
|
if (flags & S_ATIME)
|
|
inode->i_atime = *time;
|
|
if (flags & S_VERSION)
|
|
inode_inc_iversion(inode);
|
|
if (flags & S_CTIME)
|
|
inode->i_ctime = *time;
|
|
if (flags & S_MTIME)
|
|
inode->i_mtime = *time;
|
|
|
|
if (!(inode->i_sb->s_flags & MS_LAZYTIME) || (flags & S_VERSION))
|
|
iflags |= I_DIRTY_SYNC;
|
|
__mark_inode_dirty(inode, iflags);
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(generic_update_time);
|
|
|
|
/*
|
|
* This does the actual work of updating an inodes time or version. Must have
|
|
* had called mnt_want_write() before calling this.
|
|
*/
|
|
static int update_time(struct inode *inode, struct timespec *time, int flags)
|
|
{
|
|
int (*update_time)(struct inode *, struct timespec *, int);
|
|
|
|
update_time = inode->i_op->update_time ? inode->i_op->update_time :
|
|
generic_update_time;
|
|
|
|
return update_time(inode, time, flags);
|
|
}
|
|
|
|
/**
|
|
* touch_atime - update the access time
|
|
* @path: the &struct path to update
|
|
* @inode: inode to update
|
|
*
|
|
* Update the accessed time on an inode and mark it for writeback.
|
|
* This function automatically handles read only file systems and media,
|
|
* as well as the "noatime" flag and inode specific "noatime" markers.
|
|
*/
|
|
bool atime_needs_update(const struct path *path, struct inode *inode)
|
|
{
|
|
struct vfsmount *mnt = path->mnt;
|
|
struct timespec now;
|
|
|
|
if (inode->i_flags & S_NOATIME)
|
|
return false;
|
|
if (IS_NOATIME(inode))
|
|
return false;
|
|
if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
|
|
return false;
|
|
|
|
if (mnt->mnt_flags & MNT_NOATIME)
|
|
return false;
|
|
if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
|
|
return false;
|
|
|
|
now = current_fs_time(inode->i_sb);
|
|
|
|
if (!relatime_need_update(mnt, inode, now))
|
|
return false;
|
|
|
|
if (timespec_equal(&inode->i_atime, &now))
|
|
return false;
|
|
|
|
return true;
|
|
}
|
|
|
|
void touch_atime(const struct path *path)
|
|
{
|
|
struct vfsmount *mnt = path->mnt;
|
|
struct inode *inode = d_inode(path->dentry);
|
|
struct timespec now;
|
|
|
|
if (!atime_needs_update(path, inode))
|
|
return;
|
|
|
|
if (!sb_start_write_trylock(inode->i_sb))
|
|
return;
|
|
|
|
if (__mnt_want_write(mnt) != 0)
|
|
goto skip_update;
|
|
/*
|
|
* File systems can error out when updating inodes if they need to
|
|
* allocate new space to modify an inode (such is the case for
|
|
* Btrfs), but since we touch atime while walking down the path we
|
|
* really don't care if we failed to update the atime of the file,
|
|
* so just ignore the return value.
|
|
* We may also fail on filesystems that have the ability to make parts
|
|
* of the fs read only, e.g. subvolumes in Btrfs.
|
|
*/
|
|
now = current_fs_time(inode->i_sb);
|
|
update_time(inode, &now, S_ATIME);
|
|
__mnt_drop_write(mnt);
|
|
skip_update:
|
|
sb_end_write(inode->i_sb);
|
|
}
|
|
EXPORT_SYMBOL(touch_atime);
|
|
|
|
/*
|
|
* The logic we want is
|
|
*
|
|
* if suid or (sgid and xgrp)
|
|
* remove privs
|
|
*/
|
|
int should_remove_suid(struct dentry *dentry)
|
|
{
|
|
umode_t mode = d_inode(dentry)->i_mode;
|
|
int kill = 0;
|
|
|
|
/* suid always must be killed */
|
|
if (unlikely(mode & S_ISUID))
|
|
kill = ATTR_KILL_SUID;
|
|
|
|
/*
|
|
* sgid without any exec bits is just a mandatory locking mark; leave
|
|
* it alone. If some exec bits are set, it's a real sgid; kill it.
|
|
*/
|
|
if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
|
|
kill |= ATTR_KILL_SGID;
|
|
|
|
if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
|
|
return kill;
|
|
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(should_remove_suid);
|
|
|
|
/*
|
|
* Return mask of changes for notify_change() that need to be done as a
|
|
* response to write or truncate. Return 0 if nothing has to be changed.
|
|
* Negative value on error (change should be denied).
|
|
*/
|
|
int dentry_needs_remove_privs(struct dentry *dentry)
|
|
{
|
|
struct inode *inode = d_inode(dentry);
|
|
int mask = 0;
|
|
int ret;
|
|
|
|
if (IS_NOSEC(inode))
|
|
return 0;
|
|
|
|
mask = should_remove_suid(dentry);
|
|
ret = security_inode_need_killpriv(dentry);
|
|
if (ret < 0)
|
|
return ret;
|
|
if (ret)
|
|
mask |= ATTR_KILL_PRIV;
|
|
return mask;
|
|
}
|
|
EXPORT_SYMBOL(dentry_needs_remove_privs);
|
|
|
|
static int __remove_privs(struct vfsmount *mnt, struct dentry *dentry, int kill)
|
|
{
|
|
struct iattr newattrs;
|
|
|
|
newattrs.ia_valid = ATTR_FORCE | kill;
|
|
/*
|
|
* Note we call this on write, so notify_change will not
|
|
* encounter any conflicting delegations:
|
|
*/
|
|
return notify_change2(mnt, dentry, &newattrs, NULL);
|
|
}
|
|
|
|
/*
|
|
* Remove special file priviledges (suid, capabilities) when file is written
|
|
* to or truncated.
|
|
*/
|
|
int file_remove_privs(struct file *file)
|
|
{
|
|
struct dentry *dentry = file_dentry(file);
|
|
struct inode *inode = file_inode(file);
|
|
int kill;
|
|
int error = 0;
|
|
|
|
/*
|
|
* Fast path for nothing security related.
|
|
* As well for non-regular files, e.g. blkdev inodes.
|
|
* For example, blkdev_write_iter() might get here
|
|
* trying to remove privs which it is not allowed to.
|
|
*/
|
|
if (IS_NOSEC(inode) || !S_ISREG(inode->i_mode))
|
|
return 0;
|
|
|
|
kill = dentry_needs_remove_privs(dentry);
|
|
if (kill < 0)
|
|
return kill;
|
|
if (kill)
|
|
error = __remove_privs(file->f_path.mnt, dentry, kill);
|
|
if (!error)
|
|
inode_has_no_xattr(inode);
|
|
|
|
return error;
|
|
}
|
|
EXPORT_SYMBOL(file_remove_privs);
|
|
|
|
/**
|
|
* file_update_time - update mtime and ctime time
|
|
* @file: file accessed
|
|
*
|
|
* Update the mtime and ctime members of an inode and mark the inode
|
|
* for writeback. Note that this function is meant exclusively for
|
|
* usage in the file write path of filesystems, and filesystems may
|
|
* choose to explicitly ignore update via this function with the
|
|
* S_NOCMTIME inode flag, e.g. for network filesystem where these
|
|
* timestamps are handled by the server. This can return an error for
|
|
* file systems who need to allocate space in order to update an inode.
|
|
*/
|
|
|
|
int file_update_time(struct file *file)
|
|
{
|
|
struct inode *inode = file_inode(file);
|
|
struct timespec now;
|
|
int sync_it = 0;
|
|
int ret;
|
|
|
|
/* First try to exhaust all avenues to not sync */
|
|
if (IS_NOCMTIME(inode))
|
|
return 0;
|
|
|
|
now = current_fs_time(inode->i_sb);
|
|
if (!timespec_equal(&inode->i_mtime, &now))
|
|
sync_it = S_MTIME;
|
|
|
|
if (!timespec_equal(&inode->i_ctime, &now))
|
|
sync_it |= S_CTIME;
|
|
|
|
if (IS_I_VERSION(inode))
|
|
sync_it |= S_VERSION;
|
|
|
|
if (!sync_it)
|
|
return 0;
|
|
|
|
/* Finally allowed to write? Takes lock. */
|
|
if (__mnt_want_write_file(file))
|
|
return 0;
|
|
|
|
ret = update_time(inode, &now, sync_it);
|
|
__mnt_drop_write_file(file);
|
|
|
|
return ret;
|
|
}
|
|
EXPORT_SYMBOL(file_update_time);
|
|
|
|
int inode_needs_sync(struct inode *inode)
|
|
{
|
|
if (IS_SYNC(inode))
|
|
return 1;
|
|
if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode))
|
|
return 1;
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(inode_needs_sync);
|
|
|
|
/*
|
|
* If we try to find an inode in the inode hash while it is being
|
|
* deleted, we have to wait until the filesystem completes its
|
|
* deletion before reporting that it isn't found. This function waits
|
|
* until the deletion _might_ have completed. Callers are responsible
|
|
* to recheck inode state.
|
|
*
|
|
* It doesn't matter if I_NEW is not set initially, a call to
|
|
* wake_up_bit(&inode->i_state, __I_NEW) after removing from the hash list
|
|
* will DTRT.
|
|
*/
|
|
static void __wait_on_freeing_inode(struct inode *inode)
|
|
{
|
|
wait_queue_head_t *wq;
|
|
DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
|
|
wq = bit_waitqueue(&inode->i_state, __I_NEW);
|
|
prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
|
|
spin_unlock(&inode->i_lock);
|
|
spin_unlock(&inode_hash_lock);
|
|
schedule();
|
|
finish_wait(wq, &wait.wait);
|
|
spin_lock(&inode_hash_lock);
|
|
}
|
|
|
|
static __initdata unsigned long ihash_entries;
|
|
static int __init set_ihash_entries(char *str)
|
|
{
|
|
if (!str)
|
|
return 0;
|
|
ihash_entries = simple_strtoul(str, &str, 0);
|
|
return 1;
|
|
}
|
|
__setup("ihash_entries=", set_ihash_entries);
|
|
|
|
/*
|
|
* Initialize the waitqueues and inode hash table.
|
|
*/
|
|
void __init inode_init_early(void)
|
|
{
|
|
unsigned int loop;
|
|
|
|
/* If hashes are distributed across NUMA nodes, defer
|
|
* hash allocation until vmalloc space is available.
|
|
*/
|
|
if (hashdist)
|
|
return;
|
|
|
|
inode_hashtable =
|
|
alloc_large_system_hash("Inode-cache",
|
|
sizeof(struct hlist_head),
|
|
ihash_entries,
|
|
14,
|
|
HASH_EARLY,
|
|
&i_hash_shift,
|
|
&i_hash_mask,
|
|
0,
|
|
0);
|
|
|
|
for (loop = 0; loop < (1U << i_hash_shift); loop++)
|
|
INIT_HLIST_HEAD(&inode_hashtable[loop]);
|
|
}
|
|
|
|
void __init inode_init(void)
|
|
{
|
|
unsigned int loop;
|
|
|
|
/* inode slab cache */
|
|
inode_cachep = kmem_cache_create("inode_cache",
|
|
sizeof(struct inode),
|
|
0,
|
|
(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
|
|
SLAB_MEM_SPREAD),
|
|
init_once);
|
|
|
|
/* Hash may have been set up in inode_init_early */
|
|
if (!hashdist)
|
|
return;
|
|
|
|
inode_hashtable =
|
|
alloc_large_system_hash("Inode-cache",
|
|
sizeof(struct hlist_head),
|
|
ihash_entries,
|
|
14,
|
|
0,
|
|
&i_hash_shift,
|
|
&i_hash_mask,
|
|
0,
|
|
0);
|
|
|
|
for (loop = 0; loop < (1U << i_hash_shift); loop++)
|
|
INIT_HLIST_HEAD(&inode_hashtable[loop]);
|
|
}
|
|
|
|
void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
|
|
{
|
|
inode->i_mode = mode;
|
|
if (S_ISCHR(mode)) {
|
|
inode->i_fop = &def_chr_fops;
|
|
inode->i_rdev = rdev;
|
|
} else if (S_ISBLK(mode)) {
|
|
inode->i_fop = &def_blk_fops;
|
|
inode->i_rdev = rdev;
|
|
} else if (S_ISFIFO(mode))
|
|
inode->i_fop = &pipefifo_fops;
|
|
else if (S_ISSOCK(mode))
|
|
; /* leave it no_open_fops */
|
|
else
|
|
printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o) for"
|
|
" inode %s:%lu\n", mode, inode->i_sb->s_id,
|
|
inode->i_ino);
|
|
}
|
|
EXPORT_SYMBOL(init_special_inode);
|
|
|
|
/**
|
|
* inode_init_owner - Init uid,gid,mode for new inode according to posix standards
|
|
* @inode: New inode
|
|
* @dir: Directory inode
|
|
* @mode: mode of the new inode
|
|
*/
|
|
void inode_init_owner(struct inode *inode, const struct inode *dir,
|
|
umode_t mode)
|
|
{
|
|
inode->i_uid = current_fsuid();
|
|
if (dir && dir->i_mode & S_ISGID) {
|
|
inode->i_gid = dir->i_gid;
|
|
|
|
/* Directories are special, and always inherit S_ISGID */
|
|
if (S_ISDIR(mode))
|
|
mode |= S_ISGID;
|
|
else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
|
|
!in_group_p(inode->i_gid) &&
|
|
!capable_wrt_inode_uidgid(dir, CAP_FSETID))
|
|
mode &= ~S_ISGID;
|
|
} else
|
|
inode->i_gid = current_fsgid();
|
|
inode->i_mode = mode;
|
|
}
|
|
EXPORT_SYMBOL(inode_init_owner);
|
|
|
|
/**
|
|
* inode_owner_or_capable - check current task permissions to inode
|
|
* @inode: inode being checked
|
|
*
|
|
* Return true if current either has CAP_FOWNER in a namespace with the
|
|
* inode owner uid mapped, or owns the file.
|
|
*/
|
|
bool inode_owner_or_capable(const struct inode *inode)
|
|
{
|
|
struct user_namespace *ns;
|
|
|
|
if (uid_eq(current_fsuid(), inode->i_uid))
|
|
return true;
|
|
|
|
ns = current_user_ns();
|
|
if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid))
|
|
return true;
|
|
return false;
|
|
}
|
|
EXPORT_SYMBOL(inode_owner_or_capable);
|
|
|
|
/*
|
|
* Direct i/o helper functions
|
|
*/
|
|
static void __inode_dio_wait(struct inode *inode)
|
|
{
|
|
wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP);
|
|
DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP);
|
|
|
|
do {
|
|
prepare_to_wait(wq, &q.wait, TASK_UNINTERRUPTIBLE);
|
|
if (atomic_read(&inode->i_dio_count))
|
|
schedule();
|
|
} while (atomic_read(&inode->i_dio_count));
|
|
finish_wait(wq, &q.wait);
|
|
}
|
|
|
|
/**
|
|
* inode_dio_wait - wait for outstanding DIO requests to finish
|
|
* @inode: inode to wait for
|
|
*
|
|
* Waits for all pending direct I/O requests to finish so that we can
|
|
* proceed with a truncate or equivalent operation.
|
|
*
|
|
* Must be called under a lock that serializes taking new references
|
|
* to i_dio_count, usually by inode->i_mutex.
|
|
*/
|
|
void inode_dio_wait(struct inode *inode)
|
|
{
|
|
if (atomic_read(&inode->i_dio_count))
|
|
__inode_dio_wait(inode);
|
|
}
|
|
EXPORT_SYMBOL(inode_dio_wait);
|
|
|
|
/*
|
|
* inode_set_flags - atomically set some inode flags
|
|
*
|
|
* Note: the caller should be holding i_mutex, or else be sure that
|
|
* they have exclusive access to the inode structure (i.e., while the
|
|
* inode is being instantiated). The reason for the cmpxchg() loop
|
|
* --- which wouldn't be necessary if all code paths which modify
|
|
* i_flags actually followed this rule, is that there is at least one
|
|
* code path which doesn't today so we use cmpxchg() out of an abundance
|
|
* of caution.
|
|
*
|
|
* In the long run, i_mutex is overkill, and we should probably look
|
|
* at using the i_lock spinlock to protect i_flags, and then make sure
|
|
* it is so documented in include/linux/fs.h and that all code follows
|
|
* the locking convention!!
|
|
*/
|
|
void inode_set_flags(struct inode *inode, unsigned int flags,
|
|
unsigned int mask)
|
|
{
|
|
unsigned int old_flags, new_flags;
|
|
|
|
WARN_ON_ONCE(flags & ~mask);
|
|
do {
|
|
old_flags = ACCESS_ONCE(inode->i_flags);
|
|
new_flags = (old_flags & ~mask) | flags;
|
|
} while (unlikely(cmpxchg(&inode->i_flags, old_flags,
|
|
new_flags) != old_flags));
|
|
}
|
|
EXPORT_SYMBOL(inode_set_flags);
|
|
|
|
void inode_nohighmem(struct inode *inode)
|
|
{
|
|
mapping_set_gfp_mask(inode->i_mapping, GFP_USER);
|
|
}
|
|
EXPORT_SYMBOL(inode_nohighmem);
|
|
|
|
/*
|
|
* Generic function to check FS_IOC_SETFLAGS values and reject any invalid
|
|
* configurations.
|
|
*
|
|
* Note: the caller should be holding i_mutex, or else be sure that they have
|
|
* exclusive access to the inode structure.
|
|
*/
|
|
int vfs_ioc_setflags_prepare(struct inode *inode, unsigned int oldflags,
|
|
unsigned int flags)
|
|
{
|
|
/*
|
|
* The IMMUTABLE and APPEND_ONLY flags can only be changed by
|
|
* the relevant capability.
|
|
*
|
|
* This test looks nicer. Thanks to Pauline Middelink
|
|
*/
|
|
if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL) &&
|
|
!capable(CAP_LINUX_IMMUTABLE))
|
|
return -EPERM;
|
|
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL(vfs_ioc_setflags_prepare);
|