Commit Graph

15 Commits

Author SHA1 Message Date
Kirill Smelkov
c9696a8f3e fuse: Add FOPEN_STREAM to use stream_open()
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.

Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client.  See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.

To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.

This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:13 +02:00
Miklos Szeredi
00c570f4ba fuse: device fd clone
Allow an open fuse device to be "cloned".  Userspace can create a clone by:

      newfd = open("/dev/fuse", O_RDWR)
      ioctl(newfd, FUSE_DEV_IOC_CLONE, &oldfd);

At this point newfd will refer to the same fuse connection as oldfd.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: Ashish Samant <ashish.samant@oracle.com>
2015-07-01 16:26:08 +02:00
Andrew Gallagher
d7afaec0b5 fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT
Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.

This amends the following commit introduced in 3.14:

  7678ac5061  fuse: support clients that don't implement 'open'

However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+
2014-07-22 16:37:43 +02:00
Miklos Szeredi
1560c974dc fuse: add renameat2 support
Support RENAME_EXCHANGE and RENAME_NOREPLACE flags on the userspace ABI.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 16:43:44 +02:00
Maxim Patlasov
ab9e13f7c7 fuse: allow ctime flushing to userspace
The patch extends fuse_setattr_in, and extends the flush procedure
(fuse_flush_times()) called on ->write_inode() to send the ctime as well as
mtime.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 14:19:24 +02:00
Miklos Szeredi
e27c9d3877 fuse: fuse: add time_gran to INIT_OUT
Allow userspace fs to specify time granularity.

This is needed because with writeback_cache mode the kernel is responsible
for generating mtime and ctime, but if the underlying filesystem doesn't
support nanosecond granularity then the cache will contain a different
value from the one stored on the filesystem resulting in a change of times
after a cache flush.

Make the default granularity 1s.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-28 14:19:23 +02:00
Pavel Emelyanov
4d99ff8f12 fuse: Turn writeback cache on
Introduce a bit kernel and userspace exchange between each-other on
the init stage and turn writeback on if the userspace want this and
mount option 'allow_wbcache' is present (controlled by fusermount).

Also add each writable file into per-inode write list and call the
generic_file_aio_write to make use of the Linux page cache engine.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-04-02 15:38:50 +02:00
Miklos Szeredi
60b9df7a54 fuse: add flag to turn on async direct IO
Without async DIO write requests to a single file were always serialized.
With async DIO that's no longer the case.

So don't turn on async DIO by default for fear of breaking backward
compatibility.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-05-01 14:37:21 +02:00
Miklos Szeredi
4c82456eeb fuse: fix type definitions in uapi header
Commit 7e98d53086 (Synchronize fuse header with
one used in library) added #ifdef __linux__ around defines if it is not set.
The kernel build is self-contained and can be built on non-Linux toolchains.
After the mentioned commit builds on non-Linux toolchains will try to include
stdint.h and fail due to -nostdinc, and then fail with a bunch of undefined type
errors.

Fix by checking for __KERNEL__ instead of __linux__ and using the standard int
types instead of the linux specific ones.

Reported-by: Arve Hjønnevåg <arve@android.com>
Reported-by: Colin Cross <ccross@android.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-04-17 12:30:40 +02:00
Eric Wong
634734b63a fuse: allow control of adaptive readdirplus use
For some filesystems (e.g. GlusterFS), the cost of performing a
normal readdir and readdirplus are identical.  Since adaptively
using readdirplus has no benefit for those systems, give
users/filesystems the option to control adaptive readdirplus use.

v2 of this patch incorporates Miklos's suggestion to simplify the code,
as well as improving consistency of macro names and documentation.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-07 14:25:44 +01:00
Miklos Szeredi
7e98d53086 Synchronize fuse header with one used in library
The library one has provisions for use in *BSD, add them to the kernel one too.
They don't hurt and ease maintenance.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-07 11:58:12 +01:00
Enke Chen
0415d29102 fuse: send poll events
commit 626cf23660 "poll: add poll_requested_events()..." enabled us to send the
requested events to the filesystem.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-02-04 16:14:32 +01:00
Miklos Szeredi
23c153e541 fuse: bump version for READDIRPLUS
Yeah, we have a capability flag for this as well, so this is not strictly
necessary, but it doesn't hurt either.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-31 17:08:11 +01:00
Anand V. Avati
0b05b18381 fuse: implement NFS-like readdirplus support
This patch implements readdirplus support in FUSE, similar to NFS.
The payload returned in the readdirplus call contains
'fuse_entry_out' structure thereby providing all the necessary inputs
for 'faking' a lookup() operation on the spot.

If the dentry and inode already existed (for e.g. in a re-run of ls -l)
then just the inode attributes timeout and dentry timeout are refreshed.

With a simple client->network->server implementation of a FUSE based
filesystem, the following performance observations were made:

Test: Performing a filesystem crawl over 20,000 files with

sh# time ls -lR /mnt

Without readdirplus:
Run 1: 18.1s
Run 2: 16.0s
Run 3: 16.2s

With readdirplus:
Run 1: 4.1s
Run 2: 3.8s
Run 3: 3.8s

The performance improvement is significant as it avoided 20,000 upcalls
calls (lookup). Cache consistency is no worse than what already is.

Signed-off-by: Anand V. Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2013-01-24 16:21:25 +01:00
David Howells
607ca46e97 UAPI: (Scripted) Disintegrate include/linux
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-13 10:46:48 +01:00