Commit Graph

19 Commits

Author SHA1 Message Date
Daniel Rosenberg
f5f4199c10 ANDROID: fuse-bpf v1.1
These patches extend FUSE to be able to act as a stacked filesystem. This
allows pure passthrough, where the fuse file system simply reflects the lower
filesystem, and also allows optional pre and post filtering in BPF and/or the
userspace daemon as needed. This can dramatically reduce or even eliminate
transitions to and from userspace.

See https://lwn.net/Articles/915717/

Note that this patch set has been extensively tested in
common-android13-5.10

This is a squash of these changes cherry-picked from common-android13-5.10

ANDROID: fuse-bpf: Make compile and pass test
ANDROID: fuse-bpf: set error_in to ENOENT in negative lookup
ANDROID: fuse-bpf: Add ability to run ranges of tests to fuse_test
ANDROID: fuse-bpf: Add test for lookup postfilter
ANDROID: fuse-bpf: readddir postfilter fixes
ANDROID: fix kernelci error in fs/fuse/dir.c
ANDROID: fuse-bpf: Fix RCU/reference issue
ANDROID: fuse-bpf: Always call revalidate for backing
ANDROID: fuse-bpf: Adjust backing handle funcs
ANDROID: fuse-bpf: Fix revalidate error path and backing handling
ANDROID: fuse-bpf: Fix use of get_fuse_inode
ANDROID: fuse: Don't use readdirplus w/ nodeid 0
ANDROID: fuse-bpf: Introduce readdirplus test case for fuse bpf
ANDROID: fuse-bpf: Make sure force_again flag is false by default
ANDROID: fuse-bpf: Make inodes with backing_fd reachable for regular FUSE fuse_iget
Revert "ANDROID: fuse-bpf: use target instead of parent inode to execute backing revalidate"
ANDROID: fuse-bpf: use target instead of parent inode to execute backing revalidate
ANDROID: fuse-bpf: Fix misuse of args.out_args
ANDROID: fuse-bpf: Fix non-fusebpf build
ANDROID: fuse-bpf: Use fuse_bpf_args in uapi
ANDROID: fuse-bpf: Fix read_iter
ANDROID: fuse-bpf: Use cache and refcount
ANDROID: fuse-bpf: Rename iocb_fuse to iocb_orig
ANDROID: fuse-bpf: Fix fixattr in rename
ANDROID: fuse-bpf: Fix readdir
ANDROID: fuse-bpf: Fix lseek return value for offset 0
ANDROID: fuse-bpf: fix read_iter and write_iter
ANDROID: fuse-bpf: fix special devices
ANDROID: fuse-bpf: support FUSE_LSEEK
ANDROID: fuse-bpf: Add support for FUSE_COPY_FILE_RANGE
ANDROID: fuse-bpf: Report errors to finalize
ANDROID: fuse-bpf: Avoid reusing uint64_t for file
ANDROID: fuse-bpf: Fix CONFIG_FUSE_BPF typo in FUSE_FSYNCDIR
ANDROID: fuse-bpf: Move fd operations to be synchronous
ANDROID: fuse-bpf: Invalidate if lower is unhashed
ANDROID: fuse-bpf: Move bpf earlier in fuse_permission
ANDROID: fuse-bpf: Update attributes on file write
ANDROID: fuse: allow mounting with no userspace daemon
ANDROID: fuse-bpf: Support FUSE_STATFS
ANDROID: fuse-bpf: Fix filldir
ANDROID: fuse-bpf: fix fuse_create_open_finalize
ANDROID: fuse: add bpf support for removexattr
ANDROID: fuse-bpf: Fix truncate
ANDROID: fuse-bpf: Support inotify
ANDROID: fuse-bpf: Make compile with CONFIG_FUSE but no CONFIG_FUSE_BPF
ANDROID: fuse-bpf: Fix perms on readdir
ANDROID: fuse: Fix umasking in backing
ANDROID: fs/fuse: Backing move returns EXDEV if TO not backed
ANDROID: bpf-fuse: Fix Setattr
ANDROID: fuse-bpf: Check if mkdir dentry setup
ANDROID: fuse-bpf: Close backing fds in fuse_dentry_revalidate
ANDROID: fuse-bpf: Close backing-fd on both paths
ANDROID: fuse-bpf: Partial fix for mmap'd files
ANDROID: fuse-bpf: Restore a missing const
ANDROID: Add fuse-bpf self tests
ANDROID: Add FUSE_BPF to gki_defconfig
ANDROID: fuse-bpf v1
ANDROID: fuse: Move functions in preparation for fuse-bpf

Bug: 202785178
Test: test_fuse passes on linux.
      On cuttlefish,
      atest android.scopedstorage.cts.host.ScopedStorageHostTest
      passes with fuse-bpf enabled and disabled
Change-Id: Idb099c281f9b39ff2c46fa3ebc63e508758416ee
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2023-02-17 22:10:50 +00:00
Lee Jones
7561514944 Merge commit e7c6e405e1 ("Fix misc new gcc warnings") into android-mainline
Steps on the way to 5.13-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Iff6fb6b3991943905d20a8b40e2b2dd87c0d792b
2021-04-29 10:20:06 +01:00
Miklos Szeredi
9ac29fd3f8 fuse: move ioctl to separate source file
Next patch will expand ioctl code and fuse/file.c is large enough as it is.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-04-12 15:04:30 +02:00
Alessio Balsini
abbb00d5ae FROMLIST: fuse: Definitions and ioctl for passthrough
Expose the FUSE_PASSTHROUGH interface to user space and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl, which allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl requires user space to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data structure introduced in this patch. This structure includes extra
fields for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20210125153057.3623715-4-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Change-Id: I732532581348adadda5b5048a9346c2b0868d539
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 13:34:41 +00:00
Alessio Balsini
2cb79cc13a Revert "FROMLIST: fuse: Definitions and ioctl() for passthrough"
This reverts commit 314603f83d.

Change-Id: I0e6a6f7d990c2e442fd59ee30229858ad201cf63
Signed-off-by: Alessio Balsini <balsini@google.com>
2021-01-26 13:34:40 +00:00
Alessio Balsini
314603f83d FROMLIST: fuse: Definitions and ioctl() for passthrough
Expose the FUSE_PASSTHROUGH interface to userspace and declare all the
basic data structures and functions as the skeleton on top of which the
FUSE passthrough functionality will be built.

As part of this, introduce the new FUSE passthrough ioctl(), which
allows
the FUSE daemon to specify a direct connection between a FUSE file and a
lower file system file. Such ioctl() requires userspace to pass the file
descriptor of one of its opened files through the fuse_passthrough_out
data
structure introduced in this patch. This structure includes extra fields
for possible future extensions.
Also, add the passthrough functions for the set-up and tear-down of the
data structures and locks that will be used both when fuse_conns and
fuse_files are created/deleted.

Bug: 168023149
Link: https://lore.kernel.org/lkml/20201026125016.1905945-2-balsini@android.com/
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Alessio Balsini <balsini@google.com>
Change-Id: I6dd150b93607e10ed53f7e7975b35b6090080fa2
2020-11-02 19:15:35 +00:00
Vivek Goyal
1dd539577c virtiofs: add a mount option to enable dax
Add a mount option to allow using dax with virtio_fs.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2020-09-10 11:39:22 +02:00
Vivek Goyal
112e72373d virtio-fs: Change module name to virtiofs.ko
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".

Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.

  ------------[ cut here ]------------
  request_module fs-virtio_fs succeeded, but still no fs?
  WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
  Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
  CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1

So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.

It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.

Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-10-14 10:20:33 +02:00
Stefan Hajnoczi
a62a8ef9d9 virtio-fs: add virtiofs filesystem
Add a basic file system module for virtio-fs.  This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

 - Use fuse protocol (instead of 9p) for communication between guest and
   host.  Guest kernel will be fuse client and a fuse server will run on
   host to serve the requests.

 - For data access inside guest, mmap portion of file in QEMU address space
   and guest accesses this memory using dax.  That way guest page cache is
   bypassed and there is only one copy of data (on host).  This will also
   enable mmap(MAP_SHARED) between guests.

 - For metadata coherency, there is a shared memory region which contains
   version number associated with metadata and any guest changing metadata
   updates version number and other guests refresh metadata on next access.
   This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor.  The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols.  In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine.  This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well.  The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

 - cache=none
   metadata, data and pathname lookup are not cached in guest.  They are
   always fetched from host and any changes are immediately pushed to host.

 - cache=always
   metadata, data and pathname lookup are cached in guest and never expire.

 - cache=auto
   metadata and pathname lookup cache expires after a configured amount of
   time (default is 1 second).  Data is cached while the file is open
   (close to open consistency).

 - writeback/no_writeback
   These options control the writeback strategy.  If writeback is disabled,
   then normal writes will immediately be synchronized with the host fs.
   If writeback is enabled, then writes may be cached in the guest until
   the file is closed or an fsync(2) performed.  This option has no effect
   on mmap-ed writes or writes going through the DAX mechanism.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-18 20:17:50 +02:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Miklos Szeredi
d123d8e183 fuse: split out readdir.c
Directory reading code is about to grow larger, so split it out from dir.c
into a new source file.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-28 16:43:23 +02:00
Seth Forshee
60bcc88ad1 fuse: Add posix ACL support
Add a new INIT flag, FUSE_POSIX_ACL, for negotiating ACL support with
userspace.  When it is set in the INIT response, ACL support will be
enabled.  ACL support also implies "default_permissions".

When ACL support is enabled, the kernel will cache and have responsibility
for enforcing ACLs.  ACL xattrs will be passed to userspace, which is
responsible for updating the ACLs in the filesystem, keeping the file mode
in sync, and inheritance of default ACLs when new filesystem nodes are
created.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01 07:32:32 +02:00
Seth Forshee
703c73629f fuse: Use generic xattr ops
In preparation for posix acl support, rework fuse to use xattr handlers and
the generic setxattr/getxattr/listxattr callbacks.  Split the xattr code
out into it's own file, and promote symbols to module-global scope as
needed.

Functionally these changes have no impact, as fuse still uses a single
handler for all xattrs which uses the old callbacks.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-01 07:32:32 +02:00
Tejun Heo
151060ac13 CUSE: implement CUSE - Character device in Userspace
CUSE enables implementing character devices in userspace.  With recent
additions of ioctl and poll support, FUSE already has most of what's
necessary to implement character devices.  All CUSE has to do is
bonding all those components - FUSE, chardev and the driver model -
nicely.

When client opens /dev/cuse, kernel starts conversation with
CUSE_INIT.  The client tells CUSE which device it wants to create.  As
the previous patch made fuse_file usable without associated
fuse_inode, CUSE doesn't create super block or inodes.  It attaches
fuse_file to cdev file->private_data during open and set ff->fi to
NULL.  The rest of the operation is almost identical to FUSE direct IO
case.

Each CUSE device has a corresponding directory /sys/class/cuse/DEVNAME
(which is symlink to /sys/devices/virtual/class/DEVNAME if
SYSFS_DEPRECATED is turned off) which hosts "waiting" and "abort"
among other things.  Those two files have the same meaning as the FUSE
control files.

The only notable lacking feature compared to in-kernel implementation
is mmap support.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2009-06-09 11:24:11 +02:00
Miklos Szeredi
bafa96541b [PATCH] fuse: add control filesystem
Add a control filesystem to fuse, replacing the attributes currently exported
through sysfs.  An empty directory '/sys/fs/fuse/connections' is still created
in sysfs, and mounting the control filesystem here provides backward
compatibility.

Advantages of the control filesystem over the previous solution:

  - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

  - does not suffer from module unload race

[akpm@osdl.org: fix this fs for recent dhowells depredations]
[akpm@osdl.org: fix 64-bit printk warnings]
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
b6aeadeda2 [PATCH] FUSE - file operations
This patch adds the file operations of FUSE.

The following operations are added:

 o open
 o flush
 o release
 o fsync
 o readpage
 o commit_write

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
e5e5558e92 [PATCH] FUSE - read-only operations
This patch adds the read-only filesystem operations of FUSE.

This contains the following files:

 o dir.c
    - directory, symlink and file-inode operations

The following operations are added:

 o lookup
 o getattr
 o readlink
 o follow_link
 o directory open
 o readdir
 o directory release
 o permission
 o dentry revalidate
 o statfs

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:45 -07:00
Miklos Szeredi
334f485df8 [PATCH] FUSE - device functions
This adds the FUSE device handling functions.

This contains the following files:

 o dev.c
    - fuse device operations (read, write, release, poll)
    - registers misc device
    - support for sending requests to userspace

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00
Miklos Szeredi
d8a5ba4545 [PATCH] FUSE - core
This patch adds FUSE core.

This contains the following files:

 o inode.c
    - superblock operations (alloc_inode, destroy_inode, read_inode,
      clear_inode, put_super, show_options)
    - registers FUSE filesystem

 o fuse_i.h
    - private header file

Requirements
============

 The most important difference between orinary filesystems and FUSE is
 the fact, that the filesystem data/metadata is provided by a userspace
 process run with the privileges of the mount "owner" instead of the
 kernel, or some remote entity usually running with elevated
 privileges.

 The security implication of this is that a non-privileged user must
 not be able to use this capability to compromise the system.  Obvious
 requirements arising from this are:

  - mount owner should not be able to get elevated privileges with the
    help of the mounted filesystem

  - mount owner should not be able to induce undesired behavior in
    other users' or the super user's processes

  - mount owner should not get illegitimate access to information from
    other users' and the super user's processes

 These are currently ensured with the following constraints:

  1) mount is only allowed to directory or file which the mount owner
    can modify without limitation (write access + no sticky bit for
    directories)

  2) nosuid,nodev mount options are forced

  3) any process running with fsuid different from the owner is denied
     all access to the filesystem

 1) and 2) are ensured by the "fusermount" mount utility which is a
    setuid root application doing the actual mount operation.

 3) is ensured by a check in the permission() method in kernel

 I started thinking about doing 3) in a different way because Christoph
 H. made a big deal out of it, saying that FUSE is unacceptable into
 mainline in this form.

 The suggested use of private namespaces would be OK, but in their
 current form have many limitations that make their use impractical (as
 discussed in this thread).

 Suggested improvements that would address these limitations:

   - implement shared subtrees

   - allow a process to join an existing namespace (make namespaces
     first-class objects)

   - implement the namespace creation/joining in a PAM module

 With all that in place the check of owner against current->fsuid may
 be removed from the FUSE kernel module, without compromising the
 security requirements.

 Suid programs still interesting questions, since they get access even
 to the private namespace causing some information leak (exact
 order/timing of filesystem operations performed), giving some
 ptrace-like capabilities to unprivileged users.  BTW this problem is
 not strictly limited to the namespace approach, since suid programs
 setting fsuid and accessing users' files will succeed with the current
 approach too.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00