Expose the FUSE_PASSTHROUGH interface to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.
As part of this, introduce the new FUSE passthrough ioctl, which allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl requires user space to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data structure introduced in this patch. This structure includes extra
fields for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.
Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I732532581348adadda5b5048a9346c2b0868d539
Signed-off-by: Alessio Balsini <balsini@google.com>
With a 64-bit kernel build the FUSE device cannot handle ioctl requests
coming from 32-bit user space.
This is due to the ioctl command translation that generates different
command identifiers that thus cannot be used for direct comparisons
without proper manipulation.
Explicitly extract type and number from the ioctl command to enable
32-bit user space compatibility on 64-bit kernel builds.
Bug: 179164095
Link: https://lore.kernel.org/lkml/20210125153057.3623715-3-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I595517c54d551be70e83c7fcb4b62397a3615004
Signed-off-by: Alessio Balsini <balsini@google.com>
Allows FUSE to report to inotify that it is acting as a layered filesystem.
The userspace component returns a string representing the location of the
underlying file. If the string cannot be resolved into a path, the top
level path is returned instead.
Bug: 23904372
Bug: 171780975
Test: Pixel 4.19 FileObserverLegacyPathTest
Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
This reverts commit dd73e5efa8 as there is
no need for it now that sdcardfs is gone.
Bug: 157700134
Test: boot redfin
Signed-off-by: Will McVicker <willmcvicker@google.com>
Change-Id: I91f41d5baba0f550196ffcbc44b3d8c4325aec55
* refs/heads/tmp-aa07ecb:
Revert "x86_64: Allow breakpoints to emulate call instructions"
Linux 4.19.46
fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
bpf, lru: avoid messing with eviction heuristics upon syscall lookup
bpf: add map_lookup_elem_sys_only for lookups from syscall side
bpf: relax inode permission check for retrieving bpf program
Revert "selftests/bpf: skip verifier tests for unsupported program types"
driver core: Postpone DMA tear-down until after devres release for probe failure
md/raid: raid5 preserve the writeback action after the parity check
Revert "Don't jump to compute_result state from check_result state"
perf/x86/intel: Fix race in intel_pmu_disable_event()
perf bench numa: Add define for RUSAGE_THREAD if not present
ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
x86/mm/mem_encrypt: Disable all instrumentation for early SME setup
sched/cpufreq: Fix kobject memleak
iwlwifi: mvm: check for length correctness in iwl_mvm_create_skb()
qmi_wwan: new Wistron, ZTE and D-Link devices
bpf: Fix preempt_enable_no_resched() abuse
power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
KVM: arm/arm64: Ensure vcpu target is unset on reset failure
net: ieee802154: fix missing checks for regmap_update_bits
mac80211: Fix kernel panic due to use of txq after free
x86: kvm: hyper-v: deal with buggy TLB flush requests from WS2012
PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
apparmorfs: fix use-after-free on symlink traversal
securityfs: fix use-after-free on symlink traversal
power: supply: cpcap-battery: Fix division by zero
clk: sunxi-ng: nkmp: Avoid GENMASK(-1, 0)
xfrm4: Fix uninitialized memory read in _decode_session4
xfrm: Honor original L3 slave device in xfrmi policy lookup
esp4: add length check for UDP encapsulation
xfrm: clean up xfrm protocol checks
vti4: ipip tunnel deregistration fixes.
xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
fuse: Add FOPEN_STREAM to use stream_open()
dm mpath: always free attached_handler_name in parse_path()
dm integrity: correctly calculate the size of metadata area
dm delay: fix a crash when invalid device is specified
dm zoned: Fix zone report handling
dm cache metadata: Fix loading discard bitset
PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum
PCI: Factor out pcie_retrain_link() function
PCI: rcar: Add the initialization of PCIe link in resume_noirq()
PCI/AER: Change pci_aer_init() stub to return void
PCI: Init PCIe feature bits for managed host bridge alloc
PCI: Mark Atheros AR9462 to avoid bus reset
PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken
fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
fbdev: sm712fb: fix support for 1024x768-16 mode
fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM
fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
fbdev: sm712fb: fix brightness control on reboot, don't set SR30
fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
objtool: Allow AR to be overridden with HOSTAR
MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabled
perf intel-pt: Fix sample timestamp wrt non-taken branches
perf intel-pt: Fix improved sample timestamp
perf intel-pt: Fix instructions sampling rate
memory: tegra: Fix integer overflow on tick value calculation
tracing: Fix partial reading of trace event's id file
ftrace/x86_64: Emulate call function while updating in breakpoint handler
x86_64: Allow breakpoints to emulate call instructions
x86_64: Add gap to int3 to allow for call emulation
ceph: flush dirty inodes before proceeding with remount
iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
ovl: fix missing upper fs freeze protection on copy up for ioctl
fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
fuse: fix writepages on 32bit
udlfb: introduce a rendering mutex
udlfb: fix sleeping inside spinlock
udlfb: delete the unused parameter for dlfb_handle_damage
clk: rockchip: fix wrong clock definitions for rk3328
clk: mediatek: Disable tuner_en before change PLL rate
clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
clk: hi3660: Mark clk_gate_ufs_subsys as critical
PNFS fallback to MDS if no deviceid found
NFS4: Fix v4.0 client state corruption when mount
media: imx: Clear fwnode link struct for each endpoint iteration
media: imx: csi: Allow unknown nearest upstream entities
media: ov6650: Fix sensor possibly not detected on probe
phy: ti-pipe3: fix missing bit-wise or operator when assigning val
cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
of: fix clang -Wunsequenced for be32_to_cpu()
p54: drop device reference count if fails to enable device
intel_th: msu: Fix single mode with IOMMU
dcache: sort the freeing-without-RCU-delay mess for good.
md: add mddev->pers to avoid potential NULL pointer dereference
md: batch flush requests.
Revert "MD: fix lock contention for flush bios"
proc: prevent changes to overridden credentials
brd: re-enable __GFP_HIGHMEM in brd_insert_page()
stm class: Fix channel bitmap on 32-bit systems
stm class: Fix channel free in stm output free path
parisc: Rename LEVEL to PA_ASM_LEVEL to avoid name clash with DRBD code
parisc: Use PA_ASM_LEVEL in boot code
parisc: Skip registering LED when running in QEMU
parisc: Export running_on_qemu symbol for modules
net/mlx5e: Fix ethtool rxfh commands when CONFIG_MLX5_EN_RXNFC is disabled
net/mlx5: Imply MLXFW in mlx5_core
vsock/virtio: Initialize core virtio vsock before registering the driver
tipc: fix modprobe tipc failed after switch order of device registration
vsock/virtio: free packets during the socket release
tipc: switch order of device registration to fix a crash
rtnetlink: always put IFLA_LINK for links with a link-netnsid
ppp: deflate: Fix possible crash in deflate_init
nfp: flower: add rcu locks when accessing netdev for tunnels
net: usb: qmi_wwan: add Telit 0x1260 and 0x1261 compositions
net: test nouarg before dereferencing zerocopy pointers
net/mlx4_core: Change the error print to info print
net: avoid weird emergency message
net: Always descend into dsa/
ipv6: prevent possible fib6 leaks
ipv6: fix src addr routing with the exception table
Enable CONFIG_ION_SYSTEM_HEAP
ANDROID: Enable LTO and CFI
Revert "ANDROID: cuttlefish 4.19: enable CONFIG_CRYPTO_AES_NI_INTEL=y"
Conflicts:
drivers/hwtracing/stm/core.c
Change-Id: I7da86200695082b7ed40c5e6290c5b3203e094dd
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.
Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d0 ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Allows FUSE to report to inotify that it is acting
as a layered filesystem. The userspace component
returns a string representing the location of the
underlying file. If the string cannot be resolved
into a path, the top level path is returned instead.
bug: 23904372
Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Currently the userspace has no way of knowing whether the fuse
connection ended because of umount or abort via sysfs. It makes it hard
for filesystems to free the mountpoint after abort without worrying
about removing some new mount.
The patch fixes it by returning different errors when userspace reads
from /dev/fuse (-ENODEV for umount and -ECONNABORTED for abort).
Add a new capability flag FUSE_ABORT_ERROR. If set and the connection is
gone because of sysfs abort, reading from the device will return
-ECONNABORTED.
Signed-off-by: Szymon Lukasz <noh4hss@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Many user space API headers have licensing information, which is either
incomplete, badly formatted or just a shorthand for referring to the
license under which the file is supposed to be. This makes it hard for
compliance tools to determine the correct license.
Update these files with an SPDX license identifier. The identifier was
chosen based on the license information in the file.
GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
identifier with the added 'WITH Linux-syscall-note' exception, which is
the officially assigned exception identifier for the kernel syscall
exception:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
This exception makes it possible to include GPL headers into non GPL
code, without confusing license compliance tools.
Headers which have either explicit dual licensing or are just licensed
under a non GPL license are updated with the corresponding SPDX
identifier and the GPLv2 with syscall exception identifier. The format
is:
((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)
SPDX license identifiers are a legally binding shorthand, which can be
used instead of the full boiler plate text. The update does not remove
existing license information as this has to be done on a case by case
basis and the copyright holders might have to be consulted. This will
happen in a separate step.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with
userspace. When it is set in the INIT response, ACL support will be
enabled. ACL support also implies "default_permissions".
When ACL support is enabled, the kernel will cache and have responsibility
for enforcing ACLs. ACL xattrs will be passed to userspace, which is
responsible for updating the ACLs in the filesystem, keeping the file mode
in sync, and inheritance of default ACLs when new filesystem nodes are
created.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Only userspace filesystem can do the killing of suid/sgid without races.
So introduce an INIT flag and negotiate support for this.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c0 ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd ("fuse: switch to ->iterate_shared()")
Allow an open fuse device to be "cloned". Userspace can create a clone by:
newfd = open("/dev/fuse", O_RDWR)
ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);
At this point newfd will refer to the same fuse connection as oldfd.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.
This amends the following commit introduced in 3.14:
7678ac5061 fuse: support clients that don't implement 'open'
However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+
The patch extends fuse_setattr_in, and extends the flush procedure
(fuse_flush_times()) called on ->write_inode() to send the ctime as well as
mtime.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Allow userspace fs to specify time granularity.
This is needed because with writeback_cache mode the kernel is responsible
for generating mtime and ctime, but if the underlying filesystem doesn't
support nanosecond granularity then the cache will contain a different
value from the one stored on the filesystem resulting in a change of times
after a cache flush.
Make the default granularity 1s.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Introduce a bit kernel and userspace exchange between each-other on
the init stage and turn writeback on if the userspace want this and
mount option 'allow_wbcache' is present (controlled by fusermount).
Also add each writable file into per-inode write list and call the
generic_file_aio_write to make use of the Linux page cache engine.
Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Without async DIO write requests to a single file were always serialized.
With async DIO that's no longer the case.
So don't turn on async DIO by default for fear of breaking backward
compatibility.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Commit 7e98d53086 (Synchronize fuse header with
one used in library) added #ifdef __linux__ around defines if it is not set.
The kernel build is self-contained and can be built on non-Linux toolchains.
After the mentioned commit builds on non-Linux toolchains will try to include
stdint.h and fail due to -nostdinc, and then fail with a bunch of undefined type
errors.
Fix by checking for __KERNEL__ instead of __linux__ and using the standard int
types instead of the linux specific ones.
Reported-by: Arve Hjønnevåg <arve@android.com>
Reported-by: Colin Cross <ccross@android.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
For some filesystems (e.g. GlusterFS), the cost of performing a
normal readdir and readdirplus are identical. Since adaptively
using readdirplus has no benefit for those systems, give
users/filesystems the option to control adaptive readdirplus use.
v2 of this patch incorporates Miklos's suggestion to simplify the code,
as well as improving consistency of macro names and documentation.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
The library one has provisions for use in *BSD, add them to the kernel one too.
They don't hurt and ease maintenance.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
commit 626cf23660 "poll: add poll_requested_events()..." enabled us to send the
requested events to the filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Yeah, we have a capability flag for this as well, so this is not strictly
necessary, but it doesn't hurt either.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
This patch implements readdirplus support in FUSE, similar to NFS.
The payload returned in the readdirplus call contains
'fuse_entry_out' structure thereby providing all the necessary inputs
for 'faking' a lookup() operation on the spot.
If the dentry and inode already existed (for e.g. in a re-run of ls -l)
then just the inode attributes timeout and dentry timeout are refreshed.
With a simple client->network->server implementation of a FUSE based
filesystem, the following performance observations were made:
Test: Performing a filesystem crawl over 20,000 files with
sh# time ls -lR /mnt
Without readdirplus:
Run 1: 18.1s
Run 2: 16.0s
Run 3: 16.2s
With readdirplus:
Run 1: 4.1s
Run 2: 3.8s
Run 3: 3.8s
The performance improvement is significant as it avoided 20,000 upcalls
calls (lookup). Cache consistency is no worse than what already is.
Signed-off-by: Anand V. Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>