Commit Graph

29 Commits

Author SHA1 Message Date
Jens Axboe
1378a2bcb6 UPSTREAM: io_uring: ensure that io_init_req() passes in the right issue_flags
We can't use 0 here, as io_init_req() is always invoked with the
ctx uring_lock held. Newer kernels have IO_URING_F_UNLOCKED for this,
but previously we used IO_URING_F_NONBLOCK to indicate this as well.

Fixes: cf7f9cd500 ("io_uring: add missing lock in io_get_file_fixed")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 937c15e27a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I762eacf1b49ca8a38d8b77c44db4ca2bc49b2c4c
2023-03-03 14:52:14 +00:00
Bing-Jhong Billy Jheng
c658b0eec7 UPSTREAM: io_uring: add missing lock in io_get_file_fixed
io_get_file_fixed will access io_uring's context. Lock it if it is
invoked unlocked (eg via io-wq) to avoid a race condition with fixed
files getting unregistered.

No single upstream patch exists for this issue, it was fixed as part
of the file assignment changes that went into the 5.18 cycle.

Signed-off-by: Jheng, Bing-Jhong Billy <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cf7f9cd500)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I01ec4283589acde1d17318eb76f87ce099ec3fa0
2023-03-03 12:12:06 +00:00
Jens Axboe
4c1ba10aef UPSTREAM: io_uring/rw: remove leftover debug statement
commit 5c61795ea97c170347c5c4af0c159bd877b8af71 upstream.

This debug statement was never meant to go into the upstream release,
kill it off before it ends up in a release. It was just part of the
testing for the initial version of the patch.

Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called")
Change-Id: Idb52f716de1829dbdaecf601686fe9ac4d658bb7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4b6f8263e9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:37 +00:00
Jens Axboe
f771eab2e7 UPSTREAM: io_uring/rw: ensure kiocb_end_write() is always called
commit 2ec33a6c3cca9fe2465e82050c81f5ffdc508b36 upstream.

A previous commit moved the notifications and end-write handling, but
it is now missing a few spots where we also want to call both of those.
Without that, we can potentially be missing file notifications, and
more importantly, have an imbalance in the super_block writers sem
accounting.

Fixes: b000145e9907 ("io_uring/rw: defer fsnotify calls to task context")
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/20221010050319.GC2703033@dread.disaster.area/
Change-Id: Ib7c95554f80358f59042719120d7843019137e62
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b10acfcd61)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:37 +00:00
Pavel Begunkov
97903e9be8 UPSTREAM: io_uring: fix double poll leak on repolling
commit c0737fa9a5a5cf5a053bcc983f72d58919b997c6 upstream.

We have re-polling for partial IO, so a request can be polled twice. If
it used two poll entries the first time then on the second
io_arm_poll_handler() it will find the old apoll entry and NULL
kmalloc()'ed second entry, i.e. apoll->double_poll, so leaking it.

Fixes: 10c873334feba ("io_uring: allow re-poll if we made progress")
Change-Id: Ibab2f08e9f8d1e4ae25f0666dfe70e982a29a0a5
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fee2452494222ecc7f1f88c8fb659baef971414a.1655852245.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 124fb13cc7)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:37 +00:00
Alviro Iskandar Setiawan
cc1fa2006e UPSTREAM: io_uring: Clean up a false-positive warning from GCC 9.3.0
commit 0d7c1153d9291197c1dc473cfaade77acb874b4b upstream.

In io_recv(), if import_single_range() fails, the @flags variable is
uninitialized, then it will goto out_free.

After the goto, the compiler doesn't know that (ret < min_ret) is
always true, so it thinks the "if ((flags & MSG_WAITALL) ..."  path
could be taken.

The complaint comes from gcc-9 (Debian 9.3.0-22) 9.3.0:
```
  fs/io_uring.c:5238 io_recvfrom() error: uninitialized symbol 'flags'
```
Fix this by bypassing the @ret and @flags check when
import_single_range() fails.

Reasons:
 1. import_single_range() only returns -EFAULT when it fails.
 2. At that point, @flags is uninitialized and shouldn't be read.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lore.gnuweeb.org/timl/d33bb5a9-8173-f65b-f653-51fc0681c6d6@intel.com/
Cc: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Fixes: 7297ce3d59449de49d3c9e1f64ae25488750a1fc ("io_uring: improve send/recv error handling")
Change-Id: I8aaae49547d7194f21d45f0d3cbcd18650ba09f0
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gmail.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Link: https://lore.kernel.org/r/20220207140533.565411-1-ammarfaizi2@gnuweeb.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e944f1e37b)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:37 +00:00
Stefan Metzmacher
5b9835e5ba UPSTREAM: io_uring/net: fix fast_iov assignment in io_setup_async_msg()
commit 3e4cb6ebbb2bad201c1186bc0b7e8cf41dd7f7e6 upstream.

I hit a very bad problem during my tests of SENDMSG_ZC.
BUG(); in first_iovec_segment() triggered very easily.
The problem was io_setup_async_msg() in the partial retry case,
which seems to happen more often with _ZC.

iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset
being only relative to the first element.

Which means kmsg->msg.msg_iter.iov is no longer the
same as kmsg->fast_iov.

But this would rewind the copy to be the start of
async_msg->fast_iov, which means the internal
state of sync_msg->msg.msg_iter is inconsitent.

I tested with 5 vectors with length like this 4, 0, 64, 20, 8388608
and got a short writes with:
- ret=2675244 min_ret=8388692 => remaining 5713448 sr->done_io=2675244
- ret=-EAGAIN => io_uring_poll_arm
- ret=4911225 min_ret=5713448 => remaining 802223  sr->done_io=7586469
- ret=-EAGAIN => io_uring_poll_arm
- ret=802223  min_ret=802223  => res=8388692

While this was easily triggered with SENDMSG_ZC (queued for 6.1),
it was a potential problem starting with 7ba89d2af17aa879dda30f5d5d3f152e587fc551
in 5.18 for IORING_OP_RECVMSG.
And also with 4c3c09439c08b03d9503df0ca4c7619c5842892e in 5.19
for IORING_OP_SENDMSG.

However 257e84a537 introduced the critical
code into io_setup_async_msg() in 5.11.

Fixes: 7ba89d2af17aa ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Fixes: 257e84a537 ("io_uring: refactor sendmsg/recvmsg iov managing")
Cc: stable@vger.kernel.org
Change-Id: Id5b0932aa591acadb00c119222e62895031020e0
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2e7be246e2fb173520862b0c7098e55767567a2.1664436949.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2e4c95a404)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:37 +00:00
Jens Axboe
ec4363abfa UPSTREAM: io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
commit 6f83ab22adcb77a5824d2c274dace0d99e21319f upstream.

-1 tells use to use the current position, but we check if the file is
a stream regardless of that. Fix up io_kiocb_update_pos() to only
dip into file if we need to. This is both more efficient and also drops
12 bytes of text on aarch64 and 64 bytes on x86-64.

Fixes: b4aec4001595 ("io_uring: do not recalculate ppos unnecessarily")
Change-Id: Ieb0201abe63273b68dc44aba0afa4ad8013bad6c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 311b298a33)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Jens Axboe
18ffd755c4 UPSTREAM: io_uring/rw: defer fsnotify calls to task context
commit b000145e9907809406d8164c3b2b8861d95aecd1 upstream.

We can't call these off the kiocb completion as that might be off
soft/hard irq context. Defer the calls to when we process the
task_work for this request. That avoids valid complaints like:

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.0-rc6-syzkaller-00321-g105a36f3694e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_usage_bug kernel/locking/lockdep.c:3961 [inline]
 valid_state kernel/locking/lockdep.c:3973 [inline]
 mark_lock_irq kernel/locking/lockdep.c:4176 [inline]
 mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632
 mark_lock kernel/locking/lockdep.c:4596 [inline]
 mark_usage kernel/locking/lockdep.c:4527 [inline]
 __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007
 lock_acquire kernel/locking/lockdep.c:5666 [inline]
 lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
 __fs_reclaim_acquire mm/page_alloc.c:4674 [inline]
 fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688
 might_alloc include/linux/sched/mm.h:271 [inline]
 slab_pre_alloc_hook mm/slab.h:700 [inline]
 slab_alloc mm/slab.c:3278 [inline]
 __kmem_cache_alloc_lru mm/slab.c:3471 [inline]
 kmem_cache_alloc+0x39/0x520 mm/slab.c:3491
 fanotify_alloc_fid_event fs/notify/fanotify/fanotify.c:580 [inline]
 fanotify_alloc_event fs/notify/fanotify/fanotify.c:813 [inline]
 fanotify_handle_event+0x1130/0x3f40 fs/notify/fanotify/fanotify.c:948
 send_to_group fs/notify/fsnotify.c:360 [inline]
 fsnotify+0xafb/0x1680 fs/notify/fsnotify.c:570
 __fsnotify_parent+0x62f/0xa60 fs/notify/fsnotify.c:230
 fsnotify_parent include/linux/fsnotify.h:77 [inline]
 fsnotify_file include/linux/fsnotify.h:99 [inline]
 fsnotify_access include/linux/fsnotify.h:309 [inline]
 __io_complete_rw_common+0x485/0x720 io_uring/rw.c:195
 io_complete_rw+0x1a/0x1f0 io_uring/rw.c:228
 iomap_dio_complete_work fs/iomap/direct-io.c:144 [inline]
 iomap_dio_bio_end_io+0x438/0x5e0 fs/iomap/direct-io.c:178
 bio_endio+0x5f9/0x780 block/bio.c:1564
 req_bio_endio block/blk-mq.c:695 [inline]
 blk_update_request+0x3fc/0x1300 block/blk-mq.c:825
 scsi_end_request+0x7a/0x9a0 drivers/scsi/scsi_lib.c:541
 scsi_io_completion+0x173/0x1f70 drivers/scsi/scsi_lib.c:971
 scsi_complete+0x122/0x3b0 drivers/scsi/scsi_lib.c:1438
 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1022
 __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
 invoke_softirq kernel/softirq.c:445 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
 common_interrupt+0xa9/0xc0 arch/x86/kernel/irq.c:240

Fixes: f63cf5192fe3 ("io_uring: ensure that fsnotify is always called")
Link: https://lore.kernel.org/all/20220929135627.ykivmdks2w5vzrwg@quack3/
Reported-by: syzbot+dfcc5f4da15868df7d4d@syzkaller.appspotmail.com
Reported-by: Jan Kara <jack@suse.cz>
Change-Id: Ie7ba8c0baa08d2d14d1480b385ccaf7a66a7e6eb
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Bug: 268174392
(cherry picked from commit 89a410dbd0)
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Dylan Yudaken
249755d501 UPSTREAM: io_uring: do not recalculate ppos unnecessarily
commit b4aec40015953b65f2f114641e7fd7714c8df8e6 upstream.

There is a slight optimisation to be had by calculating the correct pos
pointer inside io_kiocb_update_pos and then using that later.

It seems code size drops by a bit:
000000000000a1b0 0000000000000400 t io_read
000000000000a5b0 0000000000000319 t io_write

vs
000000000000a1b0 00000000000003f6 t io_read
000000000000a5b0 0000000000000310 t io_write

Change-Id: If1f3fa4a7d26be890e91f172775e095afc82a7c8
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 05d69b372b)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Dylan Yudaken
11a4d38a17 UPSTREAM: io_uring: update kiocb->ki_pos at execution time
commit d34e1e5b396a0dbaa4a29b7138df662cfb9d8e8e upstream.

Update kiocb->ki_pos at execution time rather than in io_prep_rw().
io_prep_rw() happens before the job is enqueued to a worker and so the
offset might be read multiple times before being executed once.

Ensures that the file position in a set of _linked_ SQEs will be only
obtained after earlier SQEs have completed, and so will include their
incremented file position.

Change-Id: I3a7cdb36d4cd8ca659ae5db146e55a3e614811d9
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ff8a070253)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Dylan Yudaken
b579765524 UPSTREAM: io_uring: remove duplicated calls to io_kiocb_ppos
commit af9c45ecebaf1b428306f41421f4bcffe439f735 upstream.

io_kiocb_ppos is called in both branches, and it seems that the compiler
does not fuse this. Fusing removes a few bytes from loop_rw_iter.

Before:
$ nm -S fs/io_uring.o | grep loop_rw_iter
0000000000002430 0000000000000124 t loop_rw_iter

After:
$ nm -S fs/io_uring.o | grep loop_rw_iter
0000000000002430 000000000000010d t loop_rw_iter

Change-Id: If105cc8a6388490c7860d820a7d5de4d28d9d7c0
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b7958caf41)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Jens Axboe
7854c42ce9 UPSTREAM: io_uring: ensure that cached task references are always put on exit
commit e775f93f2ab976a2cdb4a7b53063cbe890904f73 upstream.

io_uring caches task references to avoid doing atomics for each of them
per request. If a request is put from the same task that allocated it,
then we can maintain a per-ctx cache of them. This obviously relies
on io_uring always pruning caches in a reliable way, and there's
currently a case off io_uring fd release where we can miss that.

One example is a ring setup with IOPOLL, which relies on the task
polling for completions, which will free them. However, if such a task
submits a request and then exits or closes the ring without reaping
the completion, then ring release will reap and put. If release happens
from that very same task, the completed request task refs will get
put back into the cache pool. This is problematic, as we're now beyond
the point of pruning caches.

Manually drop these caches after doing an IOPOLL reap. This releases
references from the current task, which is enough. If another task
happens to be doing the release, then the caching will not be
triggered and there's no issue.

Cc: stable@vger.kernel.org
Fixes: e98e49b2bb ("io_uring: extend task put optimisations")
Reported-by: Homin Rhee <hominlab@gmail.com>
Change-Id: I9a91a1236799c3da96fc9d2c2d90c493e96b46e7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 86e2d6901a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Dylan Yudaken
489777e5cb UPSTREAM: io_uring: fix async accept on O_NONBLOCK sockets
commit a73825ba70c93e1eb39a845bb3d9885a787f8ffe upstream.

Do not set REQ_F_NOWAIT if the socket is non blocking. When enabled this
causes the accept to immediately post a CQE with EAGAIN, which means you
cannot perform an accept SQE on a NONBLOCK socket asynchronously.

By removing the flag if there is no pending accept then poll is armed as
usual and when a connection comes in the CQE is posted.

Change-Id: I04b986502679d268665fff2048ff40e2e83a11e4
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220324143435.2875844-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 30b9068934)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Jens Axboe
eb6367e157 UPSTREAM: io_uring: allow re-poll if we made progress
commit 10c873334febaeea9aa0c25c10b5ac0951b77a5f upstream.

We currently check REQ_F_POLLED before arming async poll for a
notification to retry. If it's set, then we don't allow poll and will
punt to io-wq instead. This is done to prevent a situation where a buggy
driver will repeatedly return that there's space/data available yet we
get -EAGAIN.

However, if we already transferred data, then it should be safe to rely
on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is
also set.

Change-Id: I5e04e122cdfaa3a754af1fb896568af9ccb5e78b
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a79b13f249)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Jens Axboe
27d60b9121 UPSTREAM: io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
commit 4c3c09439c08b03d9503df0ca4c7619c5842892e upstream.

Like commit 7ba89d2af17a for recv/recvmsg, support MSG_WAITALL for the
send side. If this flag is set and we do a short send, retry for a
stream of seqpacket socket.

Change-Id: I86b7a362c9d5e91c9a663635feade329c9d9e4fe
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3c1a3d0269)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:36 +00:00
Jens Axboe
9e1c3c0759 UPSTREAM: io_uring: add flag for disabling provided buffer recycling
commit 8a3e8ee56417f5e0e66580d93941ed9d6f4c8274 upstream.

If we need to continue doing this IO, then we don't want a potentially
selected buffer recycled. Add a flag for that.

Set this for recv/recvmsg if they do partial IO.

Change-Id: I3581f97df99a882e091708302076128666c4239b
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 390b881631)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Jens Axboe
9317a8f489 UPSTREAM: io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
commit 7ba89d2af17aa879dda30f5d5d3f152e587fc551 upstream.

We currently don't attempt to get the full asked for length even if
MSG_WAITALL is set, if we get a partial receive. If we do see a partial
receive, then just note how many bytes we did and return -EAGAIN to
get it retried.

The iov is advanced appropriately for the vector based case, and we
manually bump the buffer and remainder for the non-vector case.

Cc: stable@vger.kernel.org
Reported-by: Constantine Gavrilov <constantine.gavrilov@gmail.com>
Change-Id: Ic855be831471c36f579c6e9025698a4b81f4e4b2
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9b7b0f2116)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Pavel Begunkov
0b4b61c4c9 UPSTREAM: io_uring: improve send/recv error handling
commit 7297ce3d59449de49d3c9e1f64ae25488750a1fc upstream.

Hide all error handling under common if block, removes two extra ifs on
the success path and keeps the handling more condensed.

Change-Id: Ida4831a9f3d59f2f888950fb963d1aaa821cd6ae
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5761545158a12968f3caf30f747eea65ed75dfc1.1637524285.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit cdc68e714d)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Jens Axboe
024abf21ea UPSTREAM: io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups
[ Upstream commit 4464853277d0ccdb9914608dd1332f0fa2f9846f ]

Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related
wakups, so that we can check for a circular event dependency between
eventfd and epoll. If this flag is set when our wakeup handlers are
called, then we know we have a dependency that needs to terminate
multishot requests.

eventfd and epoll are the only such possible dependencies.

Cc: stable@vger.kernel.org # 6.0
Change-Id: Id3ec862ba6410d07553a064d8f93874ac8f3aebe
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ccf06b5a98)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Jens Axboe
c5ddd17efa UPSTREAM: io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL
commit 46a525e199e4037516f7e498c18f065b09df32ac upstream.

This isn't a reliable mechanism to tell if we have task_work pending, we
really should be looking at whether we have any items queued. This is
problematic if forward progress is gated on running said task_work. One
such example is reading from a pipe, where the write side has been closed
right before the read is started. The fput() of the file queues TWA_RESUME
task_work, and we need that task_work to be run before ->release() is
called for the pipe. If ->release() isn't called, then the read will sit
forever waiting on data that will never arise.

Fix this by io_run_task_work() so it checks if we have task_work pending
rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously
doesn't work for task_work that is queued without TWA_SIGNAL.

Reported-by: Christiano Haesbaert <haesbaert@haesbaert.org>
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/665
Change-Id: Iafd72e8b4854e1b86adcd8b611d167f52fcbb970
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a9aa4aa7a5)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Jens Axboe
721ffd7377 UPSTREAM: io_uring/io-wq: only free worker if it was allocated for creation
commit e6db6f9398dadcbc06318a133d4c44a2d3844e61 upstream.

We have two types of task_work based creation, one is using an existing
worker to setup a new one (eg when going to sleep and we have no free
workers), and the other is allocating a new worker. Only the latter
should be freed when we cancel task_work creation for a new worker.

Fixes: af82425c6a2d ("io_uring/io-wq: free worker if task_work creation is canceled")
Reported-by: syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com
Change-Id: I7ce8f76fa93e2aa63c2463d2057d1c20693c67ed
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ba86db02d4)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Jens Axboe
14c0ef6643 UPSTREAM: io_uring/io-wq: free worker if task_work creation is canceled
commit af82425c6a2d2f347c79b63ce74fca6dc6be157f upstream.

If we cancel the task_work, the worker will never come into existance.
As this is the last reference to it, ensure that we get it freed
appropriately.

Cc: stable@vger.kernel.org
Reported-by: 진호 <wnwlsgh98@gmail.com>
Change-Id: I7da78f771d0845c98044a1181ea948c2784f1008
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bb135bcc94)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:35 +00:00
Pavel Begunkov
660437093d UPSTREAM: io_uring: lock overflowing for IOPOLL
commit 544d163d659d45a206d8929370d5a2984e546cb7 upstream.

syzbot reports an issue with overflow filling for IOPOLL:

WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
Workqueue: events_unbound io_ring_exit_work
Call trace:
 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
 io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773
 io_fill_cqe_req io_uring/io_uring.h:168 [inline]
 io_do_iopoll+0x474/0x62c io_uring/rw.c:1065
 io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513
 io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056
 io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869
 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
 worker_thread+0x340/0x610 kernel/workqueue.c:2436
 kthread+0x12c/0x158 kernel/kthread.c:376
 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863

There is no real problem for normal IOPOLL as flush is also called with
uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
for which __io_cqring_overflow_flush() happens from the CQ waiting path.

Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # 5.10+
Change-Id: Ice08c393705787da15361d4ba0c4524fa01d8f29
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ed4629d1e9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00
Harshit Mogalapalli
9639fa91a6 UPSTREAM: io_uring: Fix unsigned 'res' comparison with zero in io_fixup_rw_res()
Smatch warning: io_fixup_rw_res() warn:
	unsigned 'res' is never less than zero.

Change type of 'res' from unsigned to long.

Fixes: d6b7efc722 ("io_uring/rw: fix error'ed retry return values")
Change-Id: I128b9526bf7d4ab88398a41c26a4bf2c066dc7d2
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e326ee018a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00
Pavel Begunkov
42af757376 UPSTREAM: io_uring: fix CQ waiting timeout handling
commit 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 upstream.

Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in
particular we rearm it anew every time we get into
io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2
CQEs and getting a task_work in the middle may double the timeout value,
or even worse in some cases task may wait indefinitely.

Cc: stable@vger.kernel.org
Fixes: 228339662b398 ("io_uring: don't convert to jiffies for waiting on timeouts")
Change-Id: I1a2f159751c9cb5fcb8ae28f0349bcd0225fdcee
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.1672915663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c3222fd282)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00
Jens Axboe
4a7015335c UPSTREAM: io_uring: check for valid register opcode earlier
[ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ]

We only check the register opcode value inside the restricted ring
section, move it into the main io_uring_register() function instead
and check it up front.

Change-Id: I187b2979aab62484e2a169c882fb3c1ed853ba3e
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f9309dcaa9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00
Harshit Mogalapalli
e7098e7f6a UPSTREAM: io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
[ Upstream commit 998b30c3948e4d0b1097e639918c5cff332acac5 ]

Syzkaller reports a NULL deref bug as follows:

 BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
 Read of size 4 at addr 0000000000000138 by task file1/1955

 CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0xcd/0x134
  ? io_tctx_exit_cb+0x53/0xd3
  kasan_report+0xbb/0x1f0
  ? io_tctx_exit_cb+0x53/0xd3
  kasan_check_range+0x140/0x190
  io_tctx_exit_cb+0x53/0xd3
  task_work_run+0x164/0x250
  ? task_work_cancel+0x30/0x30
  get_signal+0x1c3/0x2440
  ? lock_downgrade+0x6e0/0x6e0
  ? lock_downgrade+0x6e0/0x6e0
  ? exit_signals+0x8b0/0x8b0
  ? do_raw_read_unlock+0x3b/0x70
  ? do_raw_spin_unlock+0x50/0x230
  arch_do_signal_or_restart+0x82/0x2470
  ? kmem_cache_free+0x260/0x4b0
  ? putname+0xfe/0x140
  ? get_sigframe_size+0x10/0x10
  ? do_execveat_common.isra.0+0x226/0x710
  ? lockdep_hardirqs_on+0x79/0x100
  ? putname+0xfe/0x140
  ? do_execveat_common.isra.0+0x238/0x710
  exit_to_user_mode_prepare+0x15f/0x250
  syscall_exit_to_user_mode+0x19/0x50
  do_syscall_64+0x42/0xb0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0023:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  </TASK>
 Kernel panic - not syncing: panic_on_warn set ...

This happens because the adding of task_work from io_ring_exit_work()
isn't synchronized with canceling all work items from eg exec. The
execution of the two are ordered in that they are both run by the task
itself, but if io_tctx_exit_cb() is queued while we're canceling all
work items off exec AND gets executed when the task exits to userspace
rather than in the main loop in io_uring_cancel_generic(), then we can
find current->io_uring == NULL and hit the above crash.

It's safe to add this NULL check here, because the execution of the two
paths are done by the task itself.

Cc: stable@vger.kernel.org
Fixes: d56d938b4b ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Change-Id: I043870b8749175dc0680aa2f70ec2b16966a015b
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com
[axboe: add code comment and also put an explanation in the commit msg]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f895511de9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00
Jens Axboe
9e02e95377 UPSTREAM: io_uring: move to separate directory
[ Upstream commit ed29b0b4fd835b058ddd151c49d021e28d631ee6 ]

In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.

This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.

Change-Id: I2df14f2467ce11b9a1613c73cad073c81bdd5f3f
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 998b30c3948e ("io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f435c66d23)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-03-01 12:14:34 +00:00