msm-5.15

Author	SHA1	Message	Date
Jens Axboe	1378a2bcb6	UPSTREAM: io_uring: ensure that io_init_req() passes in the right issue_flags We can't use 0 here, as io_init_req() is always invoked with the ctx uring_lock held. Newer kernels have IO_URING_F_UNLOCKED for this, but previously we used IO_URING_F_NONBLOCK to indicate this as well. Fixes: `cf7f9cd500` ("io_uring: add missing lock in io_get_file_fixed") Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `937c15e27a`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I762eacf1b49ca8a38d8b77c44db4ca2bc49b2c4c	2023-03-03 14:52:14 +00:00
Bing-Jhong Billy Jheng	c658b0eec7	UPSTREAM: io_uring: add missing lock in io_get_file_fixed io_get_file_fixed will access io_uring's context. Lock it if it is invoked unlocked (eg via io-wq) to avoid a race condition with fixed files getting unregistered. No single upstream patch exists for this issue, it was fixed as part of the file assignment changes that went into the 5.18 cycle. Signed-off-by: Jheng, Bing-Jhong Billy <billy@starlabs.sg> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `cf7f9cd500`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I01ec4283589acde1d17318eb76f87ce099ec3fa0	2023-03-03 12:12:06 +00:00
Jens Axboe	4c1ba10aef	UPSTREAM: io_uring/rw: remove leftover debug statement commit 5c61795ea97c170347c5c4af0c159bd877b8af71 upstream. This debug statement was never meant to go into the upstream release, kill it off before it ends up in a release. It was just part of the testing for the initial version of the patch. Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called") Change-Id: Idb52f716de1829dbdaecf601686fe9ac4d658bb7 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `4b6f8263e9`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:37 +00:00
Jens Axboe	f771eab2e7	UPSTREAM: io_uring/rw: ensure kiocb_end_write() is always called commit 2ec33a6c3cca9fe2465e82050c81f5ffdc508b36 upstream. A previous commit moved the notifications and end-write handling, but it is now missing a few spots where we also want to call both of those. Without that, we can potentially be missing file notifications, and more importantly, have an imbalance in the super_block writers sem accounting. Fixes: b000145e9907 ("io_uring/rw: defer fsnotify calls to task context") Reported-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/all/20221010050319.GC2703033@dread.disaster.area/ Change-Id: Ib7c95554f80358f59042719120d7843019137e62 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `b10acfcd61`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:37 +00:00
Pavel Begunkov	97903e9be8	UPSTREAM: io_uring: fix double poll leak on repolling commit c0737fa9a5a5cf5a053bcc983f72d58919b997c6 upstream. We have re-polling for partial IO, so a request can be polled twice. If it used two poll entries the first time then on the second io_arm_poll_handler() it will find the old apoll entry and NULL kmalloc()'ed second entry, i.e. apoll->double_poll, so leaking it. Fixes: 10c873334feba ("io_uring: allow re-poll if we made progress") Change-Id: Ibab2f08e9f8d1e4ae25f0666dfe70e982a29a0a5 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fee2452494222ecc7f1f88c8fb659baef971414a.1655852245.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `124fb13cc7`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:37 +00:00
Alviro Iskandar Setiawan	cc1fa2006e	UPSTREAM: io_uring: Clean up a false-positive warning from GCC 9.3.0 commit 0d7c1153d9291197c1dc473cfaade77acb874b4b upstream. In io_recv(), if import_single_range() fails, the @flags variable is uninitialized, then it will goto out_free. After the goto, the compiler doesn't know that (ret < min_ret) is always true, so it thinks the "if ((flags & MSG_WAITALL) ..." path could be taken. The complaint comes from gcc-9 (Debian 9.3.0-22) 9.3.0: ``` fs/io_uring.c:5238 io_recvfrom() error: uninitialized symbol 'flags' ``` Fix this by bypassing the @ret and @flags check when import_single_range() fails. Reasons: 1. import_single_range() only returns -EFAULT when it fails. 2. At that point, @flags is uninitialized and shouldn't be read. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: "Chen, Rong A" <rong.a.chen@intel.com> Link: https://lore.gnuweeb.org/timl/d33bb5a9-8173-f65b-f653-51fc0681c6d6@intel.com/ Cc: Pavel Begunkov <asml.silence@gmail.com> Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Fixes: 7297ce3d59449de49d3c9e1f64ae25488750a1fc ("io_uring: improve send/recv error handling") Change-Id: I8aaae49547d7194f21d45f0d3cbcd18650ba09f0 Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gmail.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Link: https://lore.kernel.org/r/20220207140533.565411-1-ammarfaizi2@gnuweeb.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `e944f1e37b`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:37 +00:00
Stefan Metzmacher	5b9835e5ba	UPSTREAM: io_uring/net: fix fast_iov assignment in io_setup_async_msg() commit 3e4cb6ebbb2bad201c1186bc0b7e8cf41dd7f7e6 upstream. I hit a very bad problem during my tests of SENDMSG_ZC. BUG(); in first_iovec_segment() triggered very easily. The problem was io_setup_async_msg() in the partial retry case, which seems to happen more often with _ZC. iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset being only relative to the first element. Which means kmsg->msg.msg_iter.iov is no longer the same as kmsg->fast_iov. But this would rewind the copy to be the start of async_msg->fast_iov, which means the internal state of sync_msg->msg.msg_iter is inconsitent. I tested with 5 vectors with length like this 4, 0, 64, 20, 8388608 and got a short writes with: - ret=2675244 min_ret=8388692 => remaining 5713448 sr->done_io=2675244 - ret=-EAGAIN => io_uring_poll_arm - ret=4911225 min_ret=5713448 => remaining 802223 sr->done_io=7586469 - ret=-EAGAIN => io_uring_poll_arm - ret=802223 min_ret=802223 => res=8388692 While this was easily triggered with SENDMSG_ZC (queued for 6.1), it was a potential problem starting with 7ba89d2af17aa879dda30f5d5d3f152e587fc551 in 5.18 for IORING_OP_RECVMSG. And also with 4c3c09439c08b03d9503df0ca4c7619c5842892e in 5.19 for IORING_OP_SENDMSG. However `257e84a537` introduced the critical code into io_setup_async_msg() in 5.11. Fixes: 7ba89d2af17aa ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly") Fixes: `257e84a537` ("io_uring: refactor sendmsg/recvmsg iov managing") Cc: stable@vger.kernel.org Change-Id: Id5b0932aa591acadb00c119222e62895031020e0 Signed-off-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b2e7be246e2fb173520862b0c7098e55767567a2.1664436949.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `2e4c95a404`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:37 +00:00
Jens Axboe	ec4363abfa	UPSTREAM: io_uring: io_kiocb_update_pos() should not touch file for non -1 offset commit 6f83ab22adcb77a5824d2c274dace0d99e21319f upstream. -1 tells use to use the current position, but we check if the file is a stream regardless of that. Fix up io_kiocb_update_pos() to only dip into file if we need to. This is both more efficient and also drops 12 bytes of text on aarch64 and 64 bytes on x86-64. Fixes: b4aec4001595 ("io_uring: do not recalculate ppos unnecessarily") Change-Id: Ieb0201abe63273b68dc44aba0afa4ad8013bad6c Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `311b298a33`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Jens Axboe	18ffd755c4	UPSTREAM: io_uring/rw: defer fsnotify calls to task context commit b000145e9907809406d8164c3b2b8861d95aecd1 upstream. We can't call these off the kiocb completion as that might be off soft/hard irq context. Defer the calls to when we process the task_work for this request. That avoids valid complaints like: stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.0-rc6-syzkaller-00321-g105a36f3694e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_usage_bug kernel/locking/lockdep.c:3961 [inline] valid_state kernel/locking/lockdep.c:3973 [inline] mark_lock_irq kernel/locking/lockdep.c:4176 [inline] mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632 mark_lock kernel/locking/lockdep.c:4596 [inline] mark_usage kernel/locking/lockdep.c:4527 [inline] __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007 lock_acquire kernel/locking/lockdep.c:5666 [inline] lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631 __fs_reclaim_acquire mm/page_alloc.c:4674 [inline] fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688 might_alloc include/linux/sched/mm.h:271 [inline] slab_pre_alloc_hook mm/slab.h:700 [inline] slab_alloc mm/slab.c:3278 [inline] __kmem_cache_alloc_lru mm/slab.c:3471 [inline] kmem_cache_alloc+0x39/0x520 mm/slab.c:3491 fanotify_alloc_fid_event fs/notify/fanotify/fanotify.c:580 [inline] fanotify_alloc_event fs/notify/fanotify/fanotify.c:813 [inline] fanotify_handle_event+0x1130/0x3f40 fs/notify/fanotify/fanotify.c:948 send_to_group fs/notify/fsnotify.c:360 [inline] fsnotify+0xafb/0x1680 fs/notify/fsnotify.c:570 __fsnotify_parent+0x62f/0xa60 fs/notify/fsnotify.c:230 fsnotify_parent include/linux/fsnotify.h:77 [inline] fsnotify_file include/linux/fsnotify.h:99 [inline] fsnotify_access include/linux/fsnotify.h:309 [inline] __io_complete_rw_common+0x485/0x720 io_uring/rw.c:195 io_complete_rw+0x1a/0x1f0 io_uring/rw.c:228 iomap_dio_complete_work fs/iomap/direct-io.c:144 [inline] iomap_dio_bio_end_io+0x438/0x5e0 fs/iomap/direct-io.c:178 bio_endio+0x5f9/0x780 block/bio.c:1564 req_bio_endio block/blk-mq.c:695 [inline] blk_update_request+0x3fc/0x1300 block/blk-mq.c:825 scsi_end_request+0x7a/0x9a0 drivers/scsi/scsi_lib.c:541 scsi_io_completion+0x173/0x1f70 drivers/scsi/scsi_lib.c:971 scsi_complete+0x122/0x3b0 drivers/scsi/scsi_lib.c:1438 blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1022 __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571 invoke_softirq kernel/softirq.c:445 [inline] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650 irq_exit_rcu+0x5/0x20 kernel/softirq.c:662 common_interrupt+0xa9/0xc0 arch/x86/kernel/irq.c:240 Fixes: f63cf5192fe3 ("io_uring: ensure that fsnotify is always called") Link: https://lore.kernel.org/all/20220929135627.ykivmdks2w5vzrwg@quack3/ Reported-by: syzbot+dfcc5f4da15868df7d4d@syzkaller.appspotmail.com Reported-by: Jan Kara <jack@suse.cz> Change-Id: Ie7ba8c0baa08d2d14d1480b385ccaf7a66a7e6eb Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Bug: 268174392 (cherry picked from commit `89a410dbd0`) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Dylan Yudaken	249755d501	UPSTREAM: io_uring: do not recalculate ppos unnecessarily commit b4aec40015953b65f2f114641e7fd7714c8df8e6 upstream. There is a slight optimisation to be had by calculating the correct pos pointer inside io_kiocb_update_pos and then using that later. It seems code size drops by a bit: 000000000000a1b0 0000000000000400 t io_read 000000000000a5b0 0000000000000319 t io_write vs 000000000000a1b0 00000000000003f6 t io_read 000000000000a5b0 0000000000000310 t io_write Change-Id: If1f3fa4a7d26be890e91f172775e095afc82a7c8 Signed-off-by: Dylan Yudaken <dylany@fb.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `05d69b372b`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Dylan Yudaken	11a4d38a17	UPSTREAM: io_uring: update kiocb->ki_pos at execution time commit d34e1e5b396a0dbaa4a29b7138df662cfb9d8e8e upstream. Update kiocb->ki_pos at execution time rather than in io_prep_rw(). io_prep_rw() happens before the job is enqueued to a worker and so the offset might be read multiple times before being executed once. Ensures that the file position in a set of _linked_ SQEs will be only obtained after earlier SQEs have completed, and so will include their incremented file position. Change-Id: I3a7cdb36d4cd8ca659ae5db146e55a3e614811d9 Signed-off-by: Dylan Yudaken <dylany@fb.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ff8a070253`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Dylan Yudaken	b579765524	UPSTREAM: io_uring: remove duplicated calls to io_kiocb_ppos commit af9c45ecebaf1b428306f41421f4bcffe439f735 upstream. io_kiocb_ppos is called in both branches, and it seems that the compiler does not fuse this. Fusing removes a few bytes from loop_rw_iter. Before: $ nm -S fs/io_uring.o \| grep loop_rw_iter 0000000000002430 0000000000000124 t loop_rw_iter After: $ nm -S fs/io_uring.o \| grep loop_rw_iter 0000000000002430 000000000000010d t loop_rw_iter Change-Id: If105cc8a6388490c7860d820a7d5de4d28d9d7c0 Signed-off-by: Dylan Yudaken <dylany@fb.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `b7958caf41`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Jens Axboe	7854c42ce9	UPSTREAM: io_uring: ensure that cached task references are always put on exit commit e775f93f2ab976a2cdb4a7b53063cbe890904f73 upstream. io_uring caches task references to avoid doing atomics for each of them per request. If a request is put from the same task that allocated it, then we can maintain a per-ctx cache of them. This obviously relies on io_uring always pruning caches in a reliable way, and there's currently a case off io_uring fd release where we can miss that. One example is a ring setup with IOPOLL, which relies on the task polling for completions, which will free them. However, if such a task submits a request and then exits or closes the ring without reaping the completion, then ring release will reap and put. If release happens from that very same task, the completed request task refs will get put back into the cache pool. This is problematic, as we're now beyond the point of pruning caches. Manually drop these caches after doing an IOPOLL reap. This releases references from the current task, which is enough. If another task happens to be doing the release, then the caching will not be triggered and there's no issue. Cc: stable@vger.kernel.org Fixes: `e98e49b2bb` ("io_uring: extend task put optimisations") Reported-by: Homin Rhee <hominlab@gmail.com> Change-Id: I9a91a1236799c3da96fc9d2c2d90c493e96b46e7 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `86e2d6901a`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Dylan Yudaken	489777e5cb	UPSTREAM: io_uring: fix async accept on O_NONBLOCK sockets commit a73825ba70c93e1eb39a845bb3d9885a787f8ffe upstream. Do not set REQ_F_NOWAIT if the socket is non blocking. When enabled this causes the accept to immediately post a CQE with EAGAIN, which means you cannot perform an accept SQE on a NONBLOCK socket asynchronously. By removing the flag if there is no pending accept then poll is armed as usual and when a connection comes in the CQE is posted. Change-Id: I04b986502679d268665fff2048ff40e2e83a11e4 Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220324143435.2875844-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `30b9068934`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Jens Axboe	eb6367e157	UPSTREAM: io_uring: allow re-poll if we made progress commit 10c873334febaeea9aa0c25c10b5ac0951b77a5f upstream. We currently check REQ_F_POLLED before arming async poll for a notification to retry. If it's set, then we don't allow poll and will punt to io-wq instead. This is done to prevent a situation where a buggy driver will repeatedly return that there's space/data available yet we get -EAGAIN. However, if we already transferred data, then it should be safe to rely on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is also set. Change-Id: I5e04e122cdfaa3a754af1fb896568af9ccb5e78b Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `a79b13f249`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Jens Axboe	27d60b9121	UPSTREAM: io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG) commit 4c3c09439c08b03d9503df0ca4c7619c5842892e upstream. Like commit 7ba89d2af17a for recv/recvmsg, support MSG_WAITALL for the send side. If this flag is set and we do a short send, retry for a stream of seqpacket socket. Change-Id: I86b7a362c9d5e91c9a663635feade329c9d9e4fe Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `3c1a3d0269`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:36 +00:00
Jens Axboe	9e1c3c0759	UPSTREAM: io_uring: add flag for disabling provided buffer recycling commit 8a3e8ee56417f5e0e66580d93941ed9d6f4c8274 upstream. If we need to continue doing this IO, then we don't want a potentially selected buffer recycled. Add a flag for that. Set this for recv/recvmsg if they do partial IO. Change-Id: I3581f97df99a882e091708302076128666c4239b Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `390b881631`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Jens Axboe	9317a8f489	UPSTREAM: io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly commit 7ba89d2af17aa879dda30f5d5d3f152e587fc551 upstream. We currently don't attempt to get the full asked for length even if MSG_WAITALL is set, if we get a partial receive. If we do see a partial receive, then just note how many bytes we did and return -EAGAIN to get it retried. The iov is advanced appropriately for the vector based case, and we manually bump the buffer and remainder for the non-vector case. Cc: stable@vger.kernel.org Reported-by: Constantine Gavrilov <constantine.gavrilov@gmail.com> Change-Id: Ic855be831471c36f579c6e9025698a4b81f4e4b2 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `9b7b0f2116`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Pavel Begunkov	0b4b61c4c9	UPSTREAM: io_uring: improve send/recv error handling commit 7297ce3d59449de49d3c9e1f64ae25488750a1fc upstream. Hide all error handling under common if block, removes two extra ifs on the success path and keeps the handling more condensed. Change-Id: Ida4831a9f3d59f2f888950fb963d1aaa821cd6ae Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5761545158a12968f3caf30f747eea65ed75dfc1.1637524285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `cdc68e714d`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Jens Axboe	024abf21ea	UPSTREAM: io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups [ Upstream commit 4464853277d0ccdb9914608dd1332f0fa2f9846f ] Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related wakups, so that we can check for a circular event dependency between eventfd and epoll. If this flag is set when our wakeup handlers are called, then we know we have a dependency that needs to terminate multishot requests. eventfd and epoll are the only such possible dependencies. Cc: stable@vger.kernel.org # 6.0 Change-Id: Id3ec862ba6410d07553a064d8f93874ac8f3aebe Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ccf06b5a98`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Jens Axboe	c5ddd17efa	UPSTREAM: io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL commit 46a525e199e4037516f7e498c18f065b09df32ac upstream. This isn't a reliable mechanism to tell if we have task_work pending, we really should be looking at whether we have any items queued. This is problematic if forward progress is gated on running said task_work. One such example is reading from a pipe, where the write side has been closed right before the read is started. The fput() of the file queues TWA_RESUME task_work, and we need that task_work to be run before ->release() is called for the pipe. If ->release() isn't called, then the read will sit forever waiting on data that will never arise. Fix this by io_run_task_work() so it checks if we have task_work pending rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously doesn't work for task_work that is queued without TWA_SIGNAL. Reported-by: Christiano Haesbaert <haesbaert@haesbaert.org> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/665 Change-Id: Iafd72e8b4854e1b86adcd8b611d167f52fcbb970 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `a9aa4aa7a5`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Jens Axboe	721ffd7377	UPSTREAM: io_uring/io-wq: only free worker if it was allocated for creation commit e6db6f9398dadcbc06318a133d4c44a2d3844e61 upstream. We have two types of task_work based creation, one is using an existing worker to setup a new one (eg when going to sleep and we have no free workers), and the other is allocating a new worker. Only the latter should be freed when we cancel task_work creation for a new worker. Fixes: af82425c6a2d ("io_uring/io-wq: free worker if task_work creation is canceled") Reported-by: syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com Change-Id: I7ce8f76fa93e2aa63c2463d2057d1c20693c67ed Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `ba86db02d4`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Jens Axboe	14c0ef6643	UPSTREAM: io_uring/io-wq: free worker if task_work creation is canceled commit af82425c6a2d2f347c79b63ce74fca6dc6be157f upstream. If we cancel the task_work, the worker will never come into existance. As this is the last reference to it, ensure that we get it freed appropriately. Cc: stable@vger.kernel.org Reported-by: 진호 <wnwlsgh98@gmail.com> Change-Id: I7da78f771d0845c98044a1181ea948c2784f1008 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `bb135bcc94`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:35 +00:00
Pavel Begunkov	660437093d	UPSTREAM: io_uring: lock overflowing for IOPOLL commit 544d163d659d45a206d8929370d5a2984e546cb7 upstream. syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace: io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773 io_fill_cqe_req io_uring/io_uring.h:168 [inline] io_do_iopoll+0x474/0x62c io_uring/rw.c:1065 io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513 io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056 io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289 worker_thread+0x340/0x610 kernel/workqueue.c:2436 kthread+0x12c/0x158 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL\|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path. Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # 5.10+ Change-Id: Ice08c393705787da15361d4ba0c4524fa01d8f29 Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `ed4629d1e9`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00
Harshit Mogalapalli	9639fa91a6	UPSTREAM: io_uring: Fix unsigned 'res' comparison with zero in io_fixup_rw_res() Smatch warning: io_fixup_rw_res() warn: unsigned 'res' is never less than zero. Change type of 'res' from unsigned to long. Fixes: `d6b7efc722` ("io_uring/rw: fix error'ed retry return values") Change-Id: I128b9526bf7d4ab88398a41c26a4bf2c066dc7d2 Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `e326ee018a`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00
Pavel Begunkov	42af757376	UPSTREAM: io_uring: fix CQ waiting timeout handling commit 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 upstream. Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in particular we rearm it anew every time we get into io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2 CQEs and getting a task_work in the middle may double the timeout value, or even worse in some cases task may wait indefinitely. Cc: stable@vger.kernel.org Fixes: 228339662b398 ("io_uring: don't convert to jiffies for waiting on timeouts") Change-Id: I1a2f159751c9cb5fcb8ae28f0349bcd0225fdcee Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.1672915663.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `c3222fd282`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00
Jens Axboe	4a7015335c	UPSTREAM: io_uring: check for valid register opcode earlier [ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ] We only check the register opcode value inside the restricted ring section, move it into the main io_uring_register() function instead and check it up front. Change-Id: I187b2979aab62484e2a169c882fb3c1ed853ba3e Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `f9309dcaa9`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00
Harshit Mogalapalli	e7098e7f6a	UPSTREAM: io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() [ Upstream commit 998b30c3948e4d0b1097e639918c5cff332acac5 ] Syzkaller reports a NULL deref bug as follows: BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3 Read of size 4 at addr 0000000000000138 by task file1/1955 CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_tctx_exit_cb+0x53/0xd3 kasan_report+0xbb/0x1f0 ? io_tctx_exit_cb+0x53/0xd3 kasan_check_range+0x140/0x190 io_tctx_exit_cb+0x53/0xd3 task_work_run+0x164/0x250 ? task_work_cancel+0x30/0x30 get_signal+0x1c3/0x2440 ? lock_downgrade+0x6e0/0x6e0 ? lock_downgrade+0x6e0/0x6e0 ? exit_signals+0x8b0/0x8b0 ? do_raw_read_unlock+0x3b/0x70 ? do_raw_spin_unlock+0x50/0x230 arch_do_signal_or_restart+0x82/0x2470 ? kmem_cache_free+0x260/0x4b0 ? putname+0xfe/0x140 ? get_sigframe_size+0x10/0x10 ? do_execveat_common.isra.0+0x226/0x710 ? lockdep_hardirqs_on+0x79/0x100 ? putname+0xfe/0x140 ? do_execveat_common.isra.0+0x238/0x710 exit_to_user_mode_prepare+0x15f/0x250 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x42/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0023:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... This happens because the adding of task_work from io_ring_exit_work() isn't synchronized with canceling all work items from eg exec. The execution of the two are ordered in that they are both run by the task itself, but if io_tctx_exit_cb() is queued while we're canceling all work items off exec AND gets executed when the task exits to userspace rather than in the main loop in io_uring_cancel_generic(), then we can find current->io_uring == NULL and hit the above crash. It's safe to add this NULL check here, because the execution of the two paths are done by the task itself. Cc: stable@vger.kernel.org Fixes: `d56d938b4b` ("io_uring: do ctx initiated file note removal") Reported-by: syzkaller <syzkaller@googlegroups.com> Change-Id: I043870b8749175dc0680aa2f70ec2b16966a015b Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com [axboe: add code comment and also put an explanation in the commit msg] Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `f895511de9`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00
Jens Axboe	9e02e95377	UPSTREAM: io_uring: move to separate directory [ Upstream commit ed29b0b4fd835b058ddd151c49d021e28d631ee6 ] In preparation for splitting io_uring up a bit, move it into its own top level directory. It didn't really belong in fs/ anyway, as it's not a file system only API. This adds io_uring/ and moves the core files in there, and updates the MAINTAINERS file for the new location. Change-Id: I2df14f2467ce11b9a1613c73cad073c81bdd5f3f Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 998b30c3948e ("io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit `f435c66d23`) Bug: 268174392 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2023-03-01 12:14:34 +00:00

29 Commits