We can't use 0 here, as io_init_req() is always invoked with the
ctx uring_lock held. Newer kernels have IO_URING_F_UNLOCKED for this,
but previously we used IO_URING_F_NONBLOCK to indicate this as well.
Fixes: cf7f9cd500 ("io_uring: add missing lock in io_get_file_fixed")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 937c15e27a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I762eacf1b49ca8a38d8b77c44db4ca2bc49b2c4c
io_get_file_fixed will access io_uring's context. Lock it if it is
invoked unlocked (eg via io-wq) to avoid a race condition with fixed
files getting unregistered.
No single upstream patch exists for this issue, it was fixed as part
of the file assignment changes that went into the 5.18 cycle.
Signed-off-by: Jheng, Bing-Jhong Billy <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cf7f9cd500)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I01ec4283589acde1d17318eb76f87ce099ec3fa0
commit 5c61795ea97c170347c5c4af0c159bd877b8af71 upstream.
This debug statement was never meant to go into the upstream release,
kill it off before it ends up in a release. It was just part of the
testing for the initial version of the patch.
Fixes: 2ec33a6c3cca ("io_uring/rw: ensure kiocb_end_write() is always called")
Change-Id: Idb52f716de1829dbdaecf601686fe9ac4d658bb7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4b6f8263e9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 2ec33a6c3cca9fe2465e82050c81f5ffdc508b36 upstream.
A previous commit moved the notifications and end-write handling, but
it is now missing a few spots where we also want to call both of those.
Without that, we can potentially be missing file notifications, and
more importantly, have an imbalance in the super_block writers sem
accounting.
Fixes: b000145e9907 ("io_uring/rw: defer fsnotify calls to task context")
Reported-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/all/20221010050319.GC2703033@dread.disaster.area/
Change-Id: Ib7c95554f80358f59042719120d7843019137e62
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b10acfcd61)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit c0737fa9a5a5cf5a053bcc983f72d58919b997c6 upstream.
We have re-polling for partial IO, so a request can be polled twice. If
it used two poll entries the first time then on the second
io_arm_poll_handler() it will find the old apoll entry and NULL
kmalloc()'ed second entry, i.e. apoll->double_poll, so leaking it.
Fixes: 10c873334feba ("io_uring: allow re-poll if we made progress")
Change-Id: Ibab2f08e9f8d1e4ae25f0666dfe70e982a29a0a5
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fee2452494222ecc7f1f88c8fb659baef971414a.1655852245.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 124fb13cc7)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 0d7c1153d9291197c1dc473cfaade77acb874b4b upstream.
In io_recv(), if import_single_range() fails, the @flags variable is
uninitialized, then it will goto out_free.
After the goto, the compiler doesn't know that (ret < min_ret) is
always true, so it thinks the "if ((flags & MSG_WAITALL) ..." path
could be taken.
The complaint comes from gcc-9 (Debian 9.3.0-22) 9.3.0:
```
fs/io_uring.c:5238 io_recvfrom() error: uninitialized symbol 'flags'
```
Fix this by bypassing the @ret and @flags check when
import_single_range() fails.
Reasons:
1. import_single_range() only returns -EFAULT when it fails.
2. At that point, @flags is uninitialized and shouldn't be read.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: "Chen, Rong A" <rong.a.chen@intel.com>
Link: https://lore.gnuweeb.org/timl/d33bb5a9-8173-f65b-f653-51fc0681c6d6@intel.com/
Cc: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Fixes: 7297ce3d59449de49d3c9e1f64ae25488750a1fc ("io_uring: improve send/recv error handling")
Change-Id: I8aaae49547d7194f21d45f0d3cbcd18650ba09f0
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar@gmail.com>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Link: https://lore.kernel.org/r/20220207140533.565411-1-ammarfaizi2@gnuweeb.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e944f1e37b)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3e4cb6ebbb2bad201c1186bc0b7e8cf41dd7f7e6 upstream.
I hit a very bad problem during my tests of SENDMSG_ZC.
BUG(); in first_iovec_segment() triggered very easily.
The problem was io_setup_async_msg() in the partial retry case,
which seems to happen more often with _ZC.
iov_iter_iovec_advance() may change i->iov in order to have i->iov_offset
being only relative to the first element.
Which means kmsg->msg.msg_iter.iov is no longer the
same as kmsg->fast_iov.
But this would rewind the copy to be the start of
async_msg->fast_iov, which means the internal
state of sync_msg->msg.msg_iter is inconsitent.
I tested with 5 vectors with length like this 4, 0, 64, 20, 8388608
and got a short writes with:
- ret=2675244 min_ret=8388692 => remaining 5713448 sr->done_io=2675244
- ret=-EAGAIN => io_uring_poll_arm
- ret=4911225 min_ret=5713448 => remaining 802223 sr->done_io=7586469
- ret=-EAGAIN => io_uring_poll_arm
- ret=802223 min_ret=802223 => res=8388692
While this was easily triggered with SENDMSG_ZC (queued for 6.1),
it was a potential problem starting with 7ba89d2af17aa879dda30f5d5d3f152e587fc551
in 5.18 for IORING_OP_RECVMSG.
And also with 4c3c09439c08b03d9503df0ca4c7619c5842892e in 5.19
for IORING_OP_SENDMSG.
However 257e84a537 introduced the critical
code into io_setup_async_msg() in 5.11.
Fixes: 7ba89d2af17aa ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Fixes: 257e84a537 ("io_uring: refactor sendmsg/recvmsg iov managing")
Cc: stable@vger.kernel.org
Change-Id: Id5b0932aa591acadb00c119222e62895031020e0
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2e7be246e2fb173520862b0c7098e55767567a2.1664436949.git.metze@samba.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2e4c95a404)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 6f83ab22adcb77a5824d2c274dace0d99e21319f upstream.
-1 tells use to use the current position, but we check if the file is
a stream regardless of that. Fix up io_kiocb_update_pos() to only
dip into file if we need to. This is both more efficient and also drops
12 bytes of text on aarch64 and 64 bytes on x86-64.
Fixes: b4aec4001595 ("io_uring: do not recalculate ppos unnecessarily")
Change-Id: Ieb0201abe63273b68dc44aba0afa4ad8013bad6c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 311b298a33)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit b4aec40015953b65f2f114641e7fd7714c8df8e6 upstream.
There is a slight optimisation to be had by calculating the correct pos
pointer inside io_kiocb_update_pos and then using that later.
It seems code size drops by a bit:
000000000000a1b0 0000000000000400 t io_read
000000000000a5b0 0000000000000319 t io_write
vs
000000000000a1b0 00000000000003f6 t io_read
000000000000a5b0 0000000000000310 t io_write
Change-Id: If1f3fa4a7d26be890e91f172775e095afc82a7c8
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 05d69b372b)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d34e1e5b396a0dbaa4a29b7138df662cfb9d8e8e upstream.
Update kiocb->ki_pos at execution time rather than in io_prep_rw().
io_prep_rw() happens before the job is enqueued to a worker and so the
offset might be read multiple times before being executed once.
Ensures that the file position in a set of _linked_ SQEs will be only
obtained after earlier SQEs have completed, and so will include their
incremented file position.
Change-Id: I3a7cdb36d4cd8ca659ae5db146e55a3e614811d9
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ff8a070253)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit af9c45ecebaf1b428306f41421f4bcffe439f735 upstream.
io_kiocb_ppos is called in both branches, and it seems that the compiler
does not fuse this. Fusing removes a few bytes from loop_rw_iter.
Before:
$ nm -S fs/io_uring.o | grep loop_rw_iter
0000000000002430 0000000000000124 t loop_rw_iter
After:
$ nm -S fs/io_uring.o | grep loop_rw_iter
0000000000002430 000000000000010d t loop_rw_iter
Change-Id: If105cc8a6388490c7860d820a7d5de4d28d9d7c0
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b7958caf41)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit e775f93f2ab976a2cdb4a7b53063cbe890904f73 upstream.
io_uring caches task references to avoid doing atomics for each of them
per request. If a request is put from the same task that allocated it,
then we can maintain a per-ctx cache of them. This obviously relies
on io_uring always pruning caches in a reliable way, and there's
currently a case off io_uring fd release where we can miss that.
One example is a ring setup with IOPOLL, which relies on the task
polling for completions, which will free them. However, if such a task
submits a request and then exits or closes the ring without reaping
the completion, then ring release will reap and put. If release happens
from that very same task, the completed request task refs will get
put back into the cache pool. This is problematic, as we're now beyond
the point of pruning caches.
Manually drop these caches after doing an IOPOLL reap. This releases
references from the current task, which is enough. If another task
happens to be doing the release, then the caching will not be
triggered and there's no issue.
Cc: stable@vger.kernel.org
Fixes: e98e49b2bb ("io_uring: extend task put optimisations")
Reported-by: Homin Rhee <hominlab@gmail.com>
Change-Id: I9a91a1236799c3da96fc9d2c2d90c493e96b46e7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 86e2d6901a)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit a73825ba70c93e1eb39a845bb3d9885a787f8ffe upstream.
Do not set REQ_F_NOWAIT if the socket is non blocking. When enabled this
causes the accept to immediately post a CQE with EAGAIN, which means you
cannot perform an accept SQE on a NONBLOCK socket asynchronously.
By removing the flag if there is no pending accept then poll is armed as
usual and when a connection comes in the CQE is posted.
Change-Id: I04b986502679d268665fff2048ff40e2e83a11e4
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220324143435.2875844-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 30b9068934)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 10c873334febaeea9aa0c25c10b5ac0951b77a5f upstream.
We currently check REQ_F_POLLED before arming async poll for a
notification to retry. If it's set, then we don't allow poll and will
punt to io-wq instead. This is done to prevent a situation where a buggy
driver will repeatedly return that there's space/data available yet we
get -EAGAIN.
However, if we already transferred data, then it should be safe to rely
on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is
also set.
Change-Id: I5e04e122cdfaa3a754af1fb896568af9ccb5e78b
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a79b13f249)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4c3c09439c08b03d9503df0ca4c7619c5842892e upstream.
Like commit 7ba89d2af17a for recv/recvmsg, support MSG_WAITALL for the
send side. If this flag is set and we do a short send, retry for a
stream of seqpacket socket.
Change-Id: I86b7a362c9d5e91c9a663635feade329c9d9e4fe
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3c1a3d0269)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 8a3e8ee56417f5e0e66580d93941ed9d6f4c8274 upstream.
If we need to continue doing this IO, then we don't want a potentially
selected buffer recycled. Add a flag for that.
Set this for recv/recvmsg if they do partial IO.
Change-Id: I3581f97df99a882e091708302076128666c4239b
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 390b881631)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 7ba89d2af17aa879dda30f5d5d3f152e587fc551 upstream.
We currently don't attempt to get the full asked for length even if
MSG_WAITALL is set, if we get a partial receive. If we do see a partial
receive, then just note how many bytes we did and return -EAGAIN to
get it retried.
The iov is advanced appropriately for the vector based case, and we
manually bump the buffer and remainder for the non-vector case.
Cc: stable@vger.kernel.org
Reported-by: Constantine Gavrilov <constantine.gavrilov@gmail.com>
Change-Id: Ic855be831471c36f579c6e9025698a4b81f4e4b2
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9b7b0f2116)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 4464853277d0ccdb9914608dd1332f0fa2f9846f ]
Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related
wakups, so that we can check for a circular event dependency between
eventfd and epoll. If this flag is set when our wakeup handlers are
called, then we know we have a dependency that needs to terminate
multishot requests.
eventfd and epoll are the only such possible dependencies.
Cc: stable@vger.kernel.org # 6.0
Change-Id: Id3ec862ba6410d07553a064d8f93874ac8f3aebe
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ccf06b5a98)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 46a525e199e4037516f7e498c18f065b09df32ac upstream.
This isn't a reliable mechanism to tell if we have task_work pending, we
really should be looking at whether we have any items queued. This is
problematic if forward progress is gated on running said task_work. One
such example is reading from a pipe, where the write side has been closed
right before the read is started. The fput() of the file queues TWA_RESUME
task_work, and we need that task_work to be run before ->release() is
called for the pipe. If ->release() isn't called, then the read will sit
forever waiting on data that will never arise.
Fix this by io_run_task_work() so it checks if we have task_work pending
rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously
doesn't work for task_work that is queued without TWA_SIGNAL.
Reported-by: Christiano Haesbaert <haesbaert@haesbaert.org>
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/issues/665
Change-Id: Iafd72e8b4854e1b86adcd8b611d167f52fcbb970
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a9aa4aa7a5)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit e6db6f9398dadcbc06318a133d4c44a2d3844e61 upstream.
We have two types of task_work based creation, one is using an existing
worker to setup a new one (eg when going to sleep and we have no free
workers), and the other is allocating a new worker. Only the latter
should be freed when we cancel task_work creation for a new worker.
Fixes: af82425c6a2d ("io_uring/io-wq: free worker if task_work creation is canceled")
Reported-by: syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com
Change-Id: I7ce8f76fa93e2aa63c2463d2057d1c20693c67ed
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ba86db02d4)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit af82425c6a2d2f347c79b63ce74fca6dc6be157f upstream.
If we cancel the task_work, the worker will never come into existance.
As this is the last reference to it, ensure that we get it freed
appropriately.
Cc: stable@vger.kernel.org
Reported-by: 진호 <wnwlsgh98@gmail.com>
Change-Id: I7da78f771d0845c98044a1181ea948c2784f1008
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bb135bcc94)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 544d163d659d45a206d8929370d5a2984e546cb7 upstream.
syzbot reports an issue with overflow filling for IOPOLL:
WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
Workqueue: events_unbound io_ring_exit_work
Call trace:
io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773
io_fill_cqe_req io_uring/io_uring.h:168 [inline]
io_do_iopoll+0x474/0x62c io_uring/rw.c:1065
io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513
io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056
io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869
process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
worker_thread+0x340/0x610 kernel/workqueue.c:2436
kthread+0x12c/0x158 kernel/kthread.c:376
ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863
There is no real problem for normal IOPOLL as flush is also called with
uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
for which __io_cqring_overflow_flush() happens from the CQ waiting path.
Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org # 5.10+
Change-Id: Ice08c393705787da15361d4ba0c4524fa01d8f29
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ed4629d1e9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 upstream.
Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in
particular we rearm it anew every time we get into
io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2
CQEs and getting a task_work in the middle may double the timeout value,
or even worse in some cases task may wait indefinitely.
Cc: stable@vger.kernel.org
Fixes: 228339662b398 ("io_uring: don't convert to jiffies for waiting on timeouts")
Change-Id: I1a2f159751c9cb5fcb8ae28f0349bcd0225fdcee
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.1672915663.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c3222fd282)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ]
We only check the register opcode value inside the restricted ring
section, move it into the main io_uring_register() function instead
and check it up front.
Change-Id: I187b2979aab62484e2a169c882fb3c1ed853ba3e
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f9309dcaa9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 998b30c3948e4d0b1097e639918c5cff332acac5 ]
Syzkaller reports a NULL deref bug as follows:
BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
Read of size 4 at addr 0000000000000138 by task file1/1955
CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0xcd/0x134
? io_tctx_exit_cb+0x53/0xd3
kasan_report+0xbb/0x1f0
? io_tctx_exit_cb+0x53/0xd3
kasan_check_range+0x140/0x190
io_tctx_exit_cb+0x53/0xd3
task_work_run+0x164/0x250
? task_work_cancel+0x30/0x30
get_signal+0x1c3/0x2440
? lock_downgrade+0x6e0/0x6e0
? lock_downgrade+0x6e0/0x6e0
? exit_signals+0x8b0/0x8b0
? do_raw_read_unlock+0x3b/0x70
? do_raw_spin_unlock+0x50/0x230
arch_do_signal_or_restart+0x82/0x2470
? kmem_cache_free+0x260/0x4b0
? putname+0xfe/0x140
? get_sigframe_size+0x10/0x10
? do_execveat_common.isra.0+0x226/0x710
? lockdep_hardirqs_on+0x79/0x100
? putname+0xfe/0x140
? do_execveat_common.isra.0+0x238/0x710
exit_to_user_mode_prepare+0x15f/0x250
syscall_exit_to_user_mode+0x19/0x50
do_syscall_64+0x42/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0023:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
This happens because the adding of task_work from io_ring_exit_work()
isn't synchronized with canceling all work items from eg exec. The
execution of the two are ordered in that they are both run by the task
itself, but if io_tctx_exit_cb() is queued while we're canceling all
work items off exec AND gets executed when the task exits to userspace
rather than in the main loop in io_uring_cancel_generic(), then we can
find current->io_uring == NULL and hit the above crash.
It's safe to add this NULL check here, because the execution of the two
paths are done by the task itself.
Cc: stable@vger.kernel.org
Fixes: d56d938b4b ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Change-Id: I043870b8749175dc0680aa2f70ec2b16966a015b
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20221206093833.3812138-1-harshit.m.mogalapalli@oracle.com
[axboe: add code comment and also put an explanation in the commit msg]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f895511de9)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit ed29b0b4fd835b058ddd151c49d021e28d631ee6 ]
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.
This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.
Change-Id: I2df14f2467ce11b9a1613c73cad073c81bdd5f3f
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 998b30c3948e ("io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f435c66d23)
Bug: 268174392
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>