46720 Commits

Author SHA1 Message Date
Ross Zwisler
a5f728e5c7 ext4: introduce jbd2_inode dirty range scoping and use it on ext4
jbd2: introduce jbd2_inode dirty range scoping:
Currently both journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() operate on the entire address space
of each of the inodes associated with a given journal entry.  The
consequence of this is that if we have an inode where we are constantly
appending dirty pages we can end up waiting for an indefinite amount of
time in journal_finish_inode_data_buffers() while we wait for all the
pages under writeback to be written out.

The easiest way to cause this type of workload is do just dd from
/dev/zero to a file until it fills the entire filesystem.  This can
cause journal_finish_inode_data_buffers() to wait for the duration of
the entire dd operation.

We can improve this situation by scoping each of the inode dirty ranges
associated with a given transaction.  We do this via the jbd2_inode
structure so that the scoping is contained within jbd2 and so that it
follows the lifetime and locking rules for that structure.

This allows us to limit the writeback & wait in
journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() respectively to the dirty range for
a given struct jdb2_inode, keeping us from waiting forever if the inode
in question is still being appended to.

ext4: use jbd2_inode dirty range scoping:
Use the newly introduced jbd2_inode dirty range scoping to prevent us
from waiting forever when trying to complete a journal transaction.

jbd2: Introduce jbd2_inode next dirty range scoping:
Distinguish between the current dirty range and the next dirty range.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Change-Id: Idd339a5f4edbcd16e16fe4a861eb28c3ac7a08bd
2024-01-07 14:05:43 +00:00
stic-server-open
084504d01b ext4: Stop trim mechanism after receiving SIGUSR1 signal
* Same idea as xiaomi f2fs

Change-Id: I48ae59f6cd6548d491bfc81898bb19e795138255
2024-01-07 14:05:42 +00:00
Yumi Yukimura
1a775072f6 fs: proc: Add PROC_CMDLINE_APPEND_ANDROID_FORCE_NORMAL_BOOT
For Android 9 launch A/B devices migrating to Android 10 style
system-as-root, `androidboot.force_normal_boot=1` must be passed in
cmdline when booting into normal or charger mode. However, it is not
always possible for one to modify the bootloader to adhere to these
changes. As a workaround, one can use the presence of the
`skip_initramfs` flag in cmdline to to decide whether to append the new
flag to cmdline on the kernel side.

Co-authored-by: jabashque <jabashque@gmail.com>
Change-Id: Ia00ea2c54e2a7d2275e552837039033adb98d0ff
2023-11-27 14:06:01 +08:00
Sebastiano Barezzi
b6ce64b2b3 init: Add CONFIG_INITRAMFS_IGNORE_SKIP_FLAG
* Ignoring an ignore flag, yikes
* Also replace skip_initramf with want_initramf (omitting last letter for Magisk since it binary patches that out of kernel, I'm not even sure why we're supporting that mess)

Co-authored-by: Erfan Abdi <erfangplus@gmail.com>
Change-Id: Ifdf726f128bc66bf860bbb71024f94f56879710f
2023-11-27 14:05:53 +08:00
Miklos Szeredi
04fc80eccc libfs: support RENAME_NOREPLACE in simple_rename()
This is trivial to do:

 - add flags argument to simple_rename()
 - check if flags doesn't have any other than RENAME_NOREPLACE
 - assign simple_rename() to .rename2 instead of .rename

Filesystems converted:

hugetlbfs, ramfs, bpf.

Debugfs uses simple_rename() to implement debugfs_rename(), which is for
debugfs instances to rename files internally, not for userspace filesystem
access.  For this case pass zero flags to simple_rename().

Change-Id: I1a46ece3b40b05c9f18fd13b98062d2a959b76a0
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
2023-11-16 20:18:22 +08:00
Dongliang Mu
d9ca709886 UPSTREAM: f2fs: fix UAF in f2fs_available_free_memory
if2fs_fill_super
-> f2fs_build_segment_manager
   -> create_discard_cmd_control
      -> f2fs_start_discard_thread

It invokes kthread_run to create a thread and run issue_discard_thread.

However, if f2fs_build_node_manager fails, the control flow goes to
free_nm and calls f2fs_destroy_node_manager. This function will free
sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info
after the deallocation, but before the f2fs_stop_discard_thread, it will
cause UAF(Use-after-free).

-> f2fs_destroy_segment_manager
   -> destroy_discard_cmd_control
      -> f2fs_stop_discard_thread

Fix this by stopping discard thread before f2fs_destroy_node_manager.

Note that, the commit d6d2b491a82e1 introduces the call of
f2fs_available_free_memory into issue_discard_thread.

Cc: stable@vger.kernel.org
Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 5429c9dbc9025f9a166f64e22e3a69c94fd5b29b)
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: If121b453455b11b2aded8ba8a3899faad431dbd3
2023-03-07 20:13:50 +08:00
Sahitya Tummala
7365248712 f2fs: allow to change discard policy based on cached discard cmds
With the default DPOLICY_BG discard thread is ioaware, which prevents
the discard thread from issuing the discard commands. On low RAM setups,
it is observed that these discard commands in the cache are consuming
high memory. This patch aims to relax the memory pressure on the system
due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE
based on the nm_i->ram_thresh configured.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-07 20:13:50 +08:00
Miklos Szeredi
865302291c fuse: fix pipe buffer lifetime for direct_io
commit 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 upstream.

In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.

On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.

This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.

Fix by copying pages coming from the user address space to new pipe
buffers.

Reported-by: Jann Horn <jannh@google.com>
Fixes: c3021629a0 ("fuse: support splice() reading from fuse device")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-07 20:13:12 +08:00
Eric W. Biederman
8a616a83eb userns: Don't fail follow_automount based on s_user_ns
[ Upstream commit bbc3e471011417598e598707486f5d8814ec9c01 ]

When vfs_submount was added the test to limit automounts from
filesystems that with s_user_ns != &init_user_ns accidentially left
in follow_automount.  The test was never about any security concerns
and was always about how do we implement this for filesystems whose
s_user_ns != &init_user_ns.

At the moment this check makes no difference as there are no
filesystems that both set FS_USERNS_MOUNT and implement d_automount.

Remove this check now while I am thinking about it so there will not
be odd booby traps for someone who does want to make this combination
work.

vfs_submount still needs improvements to allow this combination to work,
and vfs_submount contains a check that presents a warning.

The autofs4 filesystem could be modified to set FS_USERNS_MOUNT and it would
need not work on this code path, as userspace performs the mounts.

Fixes: 93faccbbfa95 ("fs: Better permission checking for submounts")
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I1707ab45c9b3b23ba9c06bfb4738fc85b8f9e166
2022-11-15 21:35:32 +01:00
Eric W. Biederman
62b004c983 fs: Better permission checking for submounts
commit 93faccbbfa958a9668d3ab4e30f38dd205cee8d8 upstream.

To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Fixes: 069d5ac9ae0d ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I09cb1f35368fb8dc4a64b5ac5a35c9d2843ef95b
2022-11-15 21:35:32 +01:00
Eric W. Biederman
9a6beaf980 mnt: Move the FS_USERNS_MOUNT check into sget_userns
Allowing a filesystem to be mounted by other than root in the initial
user namespace is a filesystem property not a mount namespace property
and as such should be checked in filesystem specific code.  Move the
FS_USERNS_MOUNT test into super.c:sget_userns().

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change-Id: I5da9f5ce3e7b85379a771617e3238817b777eab4
2022-11-15 21:35:32 +01:00
Jeff Layton
967a853746 locks: sprinkle some tracepoints around the file locking code
Add some tracepoints around the POSIX locking code. These were useful
when tracking down problems when handling the race between setlk and
close.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Change-Id: I270eda634890d21399ccf939ad6d03b7d201a148
2022-11-15 21:35:32 +01:00
Jeff Layton
d75f807e40 locks: rename __posix_lock_file to posix_lock_inode
...a more descriptive name and we can drop the double underscore prefix.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Change-Id: Iafb3bd86e5791d9c36bff3be7a876fa8aeb98afa
2022-11-15 21:35:32 +01:00
Eric W. Biederman
3686b47158 autofs: Fix automounts by using current_real_cred()->uid
Seth Forshee reports that in 4.8-rcN some automounts are failing
because the requesting the automount changed.

The relevant call path is:
follow_automount()
    ->d_automount
    autofs4_d_automount
       autofs4_mount_wait
           autofs4_wait

In autofs4_wait wq_uid and wq_gid are set to current_uid() and
current_gid respectively.  With follow_automount now overriding creds
uid that we export to userspace changes and that breaks existing
setups.

To remove the regression set wq_uid and wq_gid from
current_real_cred()->uid and current_real_cred()->gid respectively.
This restores the current behavior as current->real_cred is identical
to current->cred except when override creds are used.

Cc: stable@vger.kernel.org
Fixes: aeaa4a79ff6a ("fs: Call d_automount with the filesystems creds")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change-Id: I3ec133334218ec9bd108b18c92fd852104f56926
2022-11-15 21:35:32 +01:00
Eric W. Biederman
8658a41bc9 fs: Call d_automount with the filesystems creds
Seth Forshee reported a mount regression in nfs autmounts
with "fs: Add user namespace member to struct super_block".

It turns out that the assumption that current->cred is something
reasonable during mount while necessary to improve support of
unprivileged mounts is wrong in the automount path.

To fix the existing filesystems override current->cred with the
init_cred before calling d_automount and restore current->cred after
d_automount completes.

To support unprivileged mounts would require a more nuanced cred
selection, so fail on unprivileged mounts for the time being.  As none
of the filesystems that currently set FS_USERNS_MOUNT implement
d_automount this check is only good for preventing future problems.

Fixes: 6e4eab577a0c ("fs: Add user namespace member to struct super_block")
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Change-Id: I972485e9da3f2883e4ec9b38da3374e0993b1af6
2022-11-15 21:35:32 +01:00
Vaibhav Jain
6f47e535e7 UPSTREAM: kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
Recently started seeing a kernel oops when a module tries removing a
memory mapped sysfs bin_attribute. On closer investigation the root
cause seems to be kernfs_release_file() trying to call
kernfs_op.release() callback that's NULL for such sysfs
bin_attributes. The oops occurs when kernfs_release_file() is called from
kernfs_drain_open_files() to cleanup any open handles with active
memory mappings.

The patch fixes this by checking for flag KERNFS_HAS_RELEASE before
calling kernfs_release_file() in function kernfs_drain_open_files().

On ppc64-le arch with cxl module the oops back-trace is of the
form below:
[  861.381126] Unable to handle kernel paging request for instruction fetch
[  861.381360] Faulting instruction address: 0x00000000
[  861.381428] Oops: Kernel access of bad area, sig: 11 [#1]
....
[  861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR:
0000000000000000
....
Call Trace:
[c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable)
[c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0
[c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0
[c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40
[c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40
[c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl]
[c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl]
[c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110
[c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0
[c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0
[c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40
[c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0
[c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0
[c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290
[c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530
[c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350
[c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0
[c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190
[c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc

Fixes: f83f3c515654 ("kernfs: fix locking around kernfs_ops->release() callback")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit 966fa72a716ceafc69de901a31f7cc1f52b35f81)

Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test

Change-Id: I9ca5cbacd1e204a742e5616e6e101339d8719cdf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-11-15 21:35:32 +01:00
Tejun Heo
d730ee7a52 UPSTREAM: kernfs: fix locking around kernfs_ops->release() callback
The release callback may be called from two places - file release
operation and kernfs open file draining.  kernfs_open_file->mutex is
used to synchronize the two callsites.  This unfortunately leads to
possible circular locking because of->mutex is used to protect the
usual kernfs operations which may use locking constructs which are
held while removing and thus draining kernfs files.

@of->mutex is for synchronizing concurrent kernfs access operations
and all we need here is synchronization between the releaes and drain
paths.  As the drain path has to grab kernfs_open_file_mutex anyway,
let's use the mutex to synchronize the release operation instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Fixes: 0e67db2f9fe9 ("kernfs: add kernfs_ops->open/release() callbacks")
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit f83f3c515654474e19c7fc86e3b06564bb5cb4d4)

Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test

Change-Id: I75253c2aa8924987e9342d94e8bae445d6c8f5be
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-11-15 21:35:32 +01:00
Serge E. Hallyn
2255e907e8 kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
Our caller expects 0 on success, not >0.

This fixes a bug in the patch

	cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces

where /sys does not show up in mountinfo, breaking criu.

Thanks for catching this, Andrei.

Reported-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Change-Id: I3cf5886bf7a77a943a6540c4b224dd0ca805dca6
2022-11-15 21:35:32 +01:00
Sami Tolvanen
506c56dd29 BACKPORT: ANDROID: fs: logfs: fix filler function type
Bug: 67506682
Change-Id: If2659b91e250cbd9f1a4a028ff43caf71b8306dd
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 2be0847f6f9f4effe639f2caeb88bb6f16838332)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:31 +01:00
Sami Tolvanen
7f6144bfa3 BACKPORT: ANDROID: fs: gfs2: fix filler function type
Bug: 67506682
Change-Id: I50a3f85965de6e041d0f40e7bf9c2ced15ccfd49
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 920c7fd62c25da3acd3e16f3808d324a08e4a453)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:31 +01:00
Sami Tolvanen
a2fa993d6f BACKPORT: ANDROID: fs: exofs: fix filler function type
Bug: 67506682
Change-Id: I42f297bfe07a1b7916790415f35ad4f2574ceec7
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 39cdeb137340baed325e695c2ded3ec8d0abda2b)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Sami Tolvanen
17132f8d28 BACKPORT: ANDROID: fs: afs: fix filler function type
Bug: 67506682
Change-Id: I76d208c8606ee5af144891d14bd309912d4d788d
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 53f4adf6788d71d322f810efb271a5658f44d193)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Sami Tolvanen
0b86f818e9 BACKPORT: fs: nfs: fix filler function type
Bug: 67506682
Change-Id: I04d4b1b9ab0720a4f342d6617dd132de8654b94c
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit b73b94a7df67bc8f113f5e9619f9dfa4061d4f8a)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Sami Tolvanen
076965b279 BACKPORT: mm: fix filler function type mismatch
Bug: 67506682
Change-Id: I6f615164ccd86b407540ada9bbcb39d910395db9
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 4fd840d1743308b2ef470534523009dd99b3ce2b)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Miklos Szeredi
adcfa14c25 BACKPORT: vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

Bug: 67506682
Change-Id: I919a90715ed71d6caf02b1333dbfec5e7e3ad52b
(cherry picked from commit 0f78d06ac1e9b470cbd8f913ee1688c8b2c8feb3)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 04676269a0b4e1fd20a7ceaaef878fa3131517ea)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Kees Cook
55b75ea79b BACKPORT: treewide: Fix function prototypes for module_param_call()
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:

@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@

 module_param_call(_name, _set_func, _get_func, _arg, _mode);

@fix_set_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _set_func(
-_val_type _val
+const char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

@fix_get_prototype
 depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@

 int _get_func(
-_val_type _val
+char * _val
 ,
-_param_type _param
+const struct kernel_param * _param
 ) { ... }

Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:

	drivers/platform/x86/thinkpad_acpi.c
	fs/lockd/svc.c

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>

Bug: 67506682
Change-Id: I2c9c0ee8ed28065e63270a52c155e5e7d2791295
(cherry picked from commit e4dca7b7aa08b22893c45485d222b5807c1375ae)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 24da2c84bd7dcdf2b56fa8d3b2f833656ee60a01)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
Michal Marek
9f11a2683c BACKPORT: kbuild: Allow to specify composite modules with modname-m
This allows to write

  drm-$(CONFIG_AGP) += drm_agpsupport.o

without having to handle CONFIG_AGP=y vs. CONFIG_AGP=m. Only support
this syntax for modules, since built-in code depending on something
modular cannot work and init/Makefile actually relies on the current
semantics. There are a few drivers which adapted to the current
semantics out of necessity; these are fixed to also work when the
respective subsystem is modular.

Acked-by: Peter Chen <peter.chen@freescale.com> [chipidea]
Signed-off-by: Michal Marek <mmarek@suse.com>
Change-Id: Ibd0f7006c9d3b87f2b77e59bdc51c06cb361e9a0
(cherry picked from commit cf4f21938e13ea1533ebdcb21c46f1d998a44ee8)
Signed-off-by: Dan Aloni <daloni@magicleap.com>
Signed-off-by: Davide Garberi <dade.garberi@gmail.com>
2022-11-15 21:35:30 +01:00
David Howells
0271a9e9bd UPSTREAM: Make anon_inodes unconditional
Make the anon_inodes facility unconditional so that it can be used by core
VFS code.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

(cherry picked from commit dadd2299ab61fc2b55b95b7b3a8f674cdd3b69c9)

Bug: 135608568
Test: test program using syscall(__NR_sys_pidfd_open,..) and poll()
Change-Id: I2f97bda4f360d8d05bbb603de839717b3d8067ae
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2022-11-15 21:35:29 +01:00
Christian Brauner
a6dfa182c4 BACKPORT: signal: add pidfd_send_signal() syscall
The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

/* syscall number 424 */
The syscall number was chosen to be 424 to align with Arnd's rework in his
y2038 to minimize merge conflicts (cf. [25]).

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]). Clarification for this has been
requested by Eric (cf. [19]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13]) and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]).  The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siginfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siginfo_from_user_any() is caused by a bug in the original
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
any additional callers.

/* testing */
This patch was tested on x64 and x86.

/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
                                         unsigned int flags)
 {
 #ifdef __NR_pidfd_send_signal
         return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
 #else
         return -ENOSYS;
 #endif
 }

 int main(int argc, char *argv[])
 {
         int fd, ret, saved_errno, sig;

         if (argc < 3)
                 exit(EXIT_FAILURE);

         fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                 exit(EXIT_FAILURE);
         }

         sig = atoi(argv[2]);

         printf("Sending signal %d to process %s\n", sig, argv[1]);
         ret = do_pidfd_send_signal(fd, sig, NULL, 0);

         saved_errno = errno;
         close(fd);
         errno = saved_errno;

         if (ret < 0) {
                 printf("%s - Failed to send signal %d to process %s\n",
                        strerror(errno), sig, argv[1]);
                 exit(EXIT_FAILURE);
         }

         exit(EXIT_SUCCESS);
 }

/* Q&A
 * Given that it seems the same questions get asked again by people who are
 * late to the party it makes sense to add a Q&A section to the commit
 * message so it's hopefully easier to avoid duplicate threads.
 *
 * For the sake of progress please consider these arguments settled unless
 * there is a new point that desperately needs to be addressed. Please make
 * sure to check the links to the threads in this commit message whether
 * this has not already been covered.
 */
Q-01: (Florian Weimer [20], Andrew Morton [21])
      What happens when the target process has exited?
A-01: Sending the signal will fail with ESRCH (cf. [22]).

Q-02:  (Andrew Morton [21])
       Is the task_struct pinned by the fd?
A-02:  No. A reference to struct pid is kept. struct pid - as far as I
       understand - was created exactly for the reason to not require to
       pin struct task_struct (cf. [22]).

Q-03: (Andrew Morton [21])
      Does the entire procfs directory remain visible? Just one entry
      within it?
A-03: The same thing that happens right now when you hold a file descriptor
      to /proc/<pid> open (cf. [22]).

Q-04: (Andrew Morton [21])
      Does the pid remain reserved?
A-04: No. This patchset guarantees a stable handle not that pids are not
      recycled (cf. [22]).

Q-05: (Andrew Morton [21])
      Do attempts to signal that fd return errors?
A-05: See {Q,A}-01.

Q-06: (Andrew Morton [22])
      Is there a cleaner way of obtaining the fd? Another syscall perhaps.
A-06: Userspace can already trivially retrieve file descriptors from procfs
      so this is something that we will need to support anyway. Hence,
      there's no immediate need to add another syscalls just to make
      pidfd_send_signal() not dependent on the presence of procfs. However,
      adding a syscalls to get such file descriptors is planned for a
      future patchset (cf. [22]).

Q-07: (Andrew Morton [21] and others)
      This fd-for-a-process sounds like a handy thing and people may well
      think up other uses for it in the future, probably unrelated to
      signals. Are the code and the interface designed to permit such
      future applications?
A-07: Yes (cf. [22]).

Q-08: (Andrew Morton [21] and others)
      Now I think about it, why a new syscall? This thing is looking
      rather like an ioctl?
A-08: This has been extensively discussed. It was agreed that a syscall is
      preferred for a variety or reasons. Here are just a few taken from
      prior threads. Syscalls are safer than ioctl()s especially when
      signaling to fds. Processes are a core kernel concept so a syscall
      seems more appropriate. The layout of the syscall with its four
      arguments would require the addition of a custom struct for the
      ioctl() thereby causing at least the same amount or even more
      complexity for userspace than a simple syscall. The new syscall will
      replace multiple other pid-based syscalls (see description above).
      The file-descriptors-for-processes concept introduced with this
      syscall will be extended with other syscalls in the future. See also
      [22], [23] and various other threads already linked in here.

Q-09: (Florian Weimer [24])
      What happens if you use the new interface with an O_PATH descriptor?
A-09:
      pidfds opened as O_PATH fds cannot be used to send signals to a
      process (cf. [2]). Signaling processes through pidfds is the
      equivalent of writing to a file. Thus, this is not an operation that
      operates "purely at the file descriptor level" as required by the
      open(2) manpage. See also [4].

/* References */
[1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
[19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
[20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
[23]: https://lwn.net/Articles/773459/
[24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[25]: https://lore.kernel.org/lkml/CAK8P3a0ej9NcJM8wXNPbcGUyOUZYX+VLoDFdbenW3s3114oQZw@mail.gmail.com/

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>

(cherry picked from commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad)

Conflicts:
        arch/x86/entry/syscalls/syscall_32.tbl - trivial manual merge
        arch/x86/entry/syscalls/syscall_64.tbl - trivial manual merge
        include/linux/proc_fs.h - trivial manual merge
        include/linux/syscalls.h - trivial manual merge
        include/uapi/asm-generic/unistd.h - trivial manual merge
        kernel/signal.c - struct kernel_siginfo does not exist in 4.14
        kernel/sys_ni.c - cond_syscall is used instead of COND_SYSCALL
        arch/x86/entry/syscalls/syscall_32.tbl
        arch/x86/entry/syscalls/syscall_64.tbl

(1. manual merges because of 4.14 differences
 2. change prepare_kill_siginfo() to use struct siginfo instead of
kernel_siginfo
 3. use copy_from_user() instead of copy_siginfo_from_user() in copy_siginfo_from_user_any()
 4. replaced COND_SYSCALL with cond_syscall
 5. Removed __ia32_sys_pidfd_send_signal in arch/x86/entry/syscalls/syscall_32.tbl.
 6. Replaced __x64_sys_pidfd_send_signal with sys_pidfd_send_signal in arch/x86/entry/syscalls/syscall_64.tbl.)

Bug: 135608568
Test: test program using syscall(__NR_pidfd_send_signal,..) to send SIGKILL
Change-Id: I34da11c63ac8cafb0353d9af24c820cef519ec27
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: electimon <electimon@gmail.com>
2022-11-15 21:35:29 +01:00
Jan Kara
ea1f75de62 ext4: enable quota enforcement based on mount options
When quota information is stored in quota files, we enable only quota
accounting on mount and enforcement is enabled only in response to
Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a
possibility to enable quota enforcement on mount by specifying
corresponding quota mount option (usrquota, grpquota, prjquota).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Change-Id: Ibf840723e020e4826eb09c7abae47df58f98e3a5
2022-10-23 23:26:01 -04:00
Li Xi
fbea7a0312 ext4: adds project ID support
Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Change-Id: Iadddd9c7dbb2b2889270f1a5b8da3a408d22dd62
2022-10-23 23:25:59 -04:00
Li Xi
24d40233a7 ext4: add project quota support
This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

[ Includes fix from Dan Carpenter to crrect error checking from dqget(). ]

Signed-off-by: Li Xi <lixi@ddn.com>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Change-Id: Ic62742dc96a357f9dcd974b524d448697ba3e64a
2022-10-23 23:25:57 -04:00
Andreas Gruenbacher
4b7c149859 gfs2: Invalid security labels of inodes when they go invalid
When gfs2 releases the glock of an inode, it must invalidate all
information cached for that inode, including the page cache and acls.
Use the new security_inode_invalidate_secctx hook to also invalidate
security labels in that case.  These items will be reread from disk
when needed after reacquiring the glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
[PM: fixed spelling errors and description line lengths]
Signed-off-by: Paul Moore <pmoore@redhat.com>
2022-03-04 20:19:53 +01:00
Aditya Kali
a680922ccf kernfs: define kernfs_node_dentry
Add a new kernfs api is added to lookup the dentry for a particular
kernfs path.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:57 +01:00
Eric W. Biederman
88fe1ffae2 fs: Add user namespace member to struct super_block
Start marking filesystems with a user namespace owner, s_user_ns.  In
this change this is only used for permission checks of who may mount a
filesystem.  Ultimately s_user_ns will be used for translating ids and
checking capabilities for filesystems mounted from user namespaces.

The default policy for setting s_user_ns is implemented in sget(),
which arranges for s_user_ns to be set to current_user_ns() and to
ensure that the mounter of the filesystem has CAP_SYS_ADMIN in that
user_ns.

The guts of sget are split out into another function sget_userns().
The function sget_userns calls alloc_super with the specified user
namespace or it verifies the existing superblock that was found
has the expected user namespace, and fails with EBUSY when it is not.
This failing prevents users with the wrong privileges mounting a
filesystem.

The reason for the split of sget_userns from sget is that in some
cases such as mount_ns and kernfs_mount_ns a different policy for
permission checking of mounts and setting s_user_ns is necessary, and
the existence of sget_userns() allows those policies to be
implemented.

The helper mount_ns is expected to be used for filesystems such as
proc and mqueuefs which present per namespace information.  The
function mount_ns is modified to call sget_userns instead of sget to
ensure the user namespace owner of the namespace whose information is
presented by the filesystem is used on the superblock.

For sysfs and cgroup the appropriate permission checks are already in
place, and kernfs_mount_ns is modified to call sget_userns so that
the init_user_ns is the only user namespace used.

For the cgroup filesystem cgroup namespace mounts are bind mounts of a
subset of the full cgroup filesystem and as such s_user_ns must be the
same for all of them as there is only a single superblock.

Mounts of sysfs that vary based on the network namespace could in principle
change s_user_ns but it keeps the analysis and implementation of kernfs
simpler if that is not supported, and at present there appear to be no
benefits from supporting a different s_user_ns on any sysfs mount.

Getting the details of setting s_user_ns correct has been
a long process.  Thanks to Pavel Tikhorirorv who spotted a leak
in sget_userns.  Thanks to Seth Forshee who has kept the work alive.

Thanks-to: Seth Forshee <seth.forshee@canonical.com>
Thanks-to: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:56 +01:00
Eric W. Biederman
cd02cea58b vfs: Pass data, ns, and ns->userns to mount_ns
Today what is normally called data (the mount options) is not passed
to fill_super through mount_ns.

Pass the mount options and the namespace separately to mount_ns so
that filesystems such as proc that have mount options, can use
mount_ns.

Pass the user namespace to mount_ns so that the standard permission
check that verifies the mounter has permissions over the namespace can
be performed in mount_ns instead of in each filesystems .mount method.
Thus removing the duplication between mqueuefs and proc in terms of
permission checks.  The extra permission check does not currently
affect the rpc_pipefs filesystem and the nfsd filesystem as those
filesystems do not currently allow unprivileged mounts.  Without
unpvileged mounts it is guaranteed that the caller has already passed
capable(CAP_SYS_ADMIN) which guarantees extra permission check will
pass.

Update rpc_pipefs and the nfsd filesystem to ensure that the network
namespace reference is always taken in fill_super and always put in kill_sb
so that the logic is simpler and so that errors originating inside of
fill_super do not cause a network namespace leak.

Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:56 +01:00
Tejun Heo
d29c1b5be3 kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs_path*() functions always return the length of the full path but
the path content is undefined if the length is larger than the
provided buffer.  This makes its behavior different from strlcpy() and
requires error handling in all its users even when they don't care
about truncation.  In addition, the implementation can actully be
simplified by making it behave properly in strlcpy() style.

* Update kernfs_path_from_node_locked() to always fill up the buffer
  with path.  If the buffer is not large enough, the output is
  truncated and terminated.

* kernfs_path() no longer needs error handling.  Make it a simple
  inline wrapper around kernfs_path_from_node().

* sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
  Updated accordingly.

* cgroup_path()'s use of kernfs_path() updated to retain the old
  behavior.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:56 +01:00
Serge Hallyn
ae9e4b19c4 kernfs_path_from_node_locked: don't overwrite nlen
We've calculated @len to be the bytes we need for '/..' entries from
@kn_from to the common ancestor, and calculated @nlen to be the extra
bytes we need to get from the common ancestor to @kn_to.  We use them
as such at the end.  But in the loop copying the actual entries, we
overwrite @nlen.  Use a temporary variable for that instead.

Without this, the return length, when the buffer is large enough, is
wrong.  (When the buffer is NULL or too small, the returned value is
correct. The buffer contents are also correct.)

Interestingly, no callers of this function are affected by this as of
yet.  However the upcoming cgroup_show_path() will be.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:56 +01:00
Tejun Heo
2beb4520c2 kernfs: implement kernfs_walk_and_get()
Implement kernfs_walk_and_get() which is similar to
kernfs_find_and_get() but can walk a path instead of just a name.

v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
    David.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:49 +01:00
Johannes Weiner
2a1d14a66d BACKPORT: fs: kernfs: add poll file operation
Patch series "psi: pressure stall monitors", v3.

Android is adopting psi to detect and remedy memory pressure that results
in stuttering and decreased responsiveness on mobile devices.

Psi gives us the stall information, but because we're dealing with
latencies in the millisecond range, periodically reading the pressure
files to detect stalls in a timely fashion is not feasible.  Psi also
doesn't aggregate its averages at a high enough frequency right now.

This patch series extends the psi interface such that users can configure
sensitive latency thresholds and use poll() and friends to be notified
when these are breached.

As high-frequency aggregation is costly, it implements an aggregation
method that is optimized for fast, short-interval averaging, and makes the
aggregation frequency adaptive, such that high-frequency updates only
happen while monitored stall events are actively occurring.

With these patches applied, Android can monitor for, and ward off,
mounting memory shortages before they cause problems for the user.  For
example, using memory stall monitors in userspace low memory killer daemon
(lmkd) we can detect mounting pressure and kill less important processes
before device becomes visibly sluggish.  In our memory stress testing psi
memory monitors produce roughly 10x less false positives compared to
vmpressure signals.  Having ability to specify multiple triggers for the
same psi metric allows other parts of Android framework to monitor memory
state of the device and act accordingly.

The new interface is straightforward.  The user opens one of the pressure
files for writing and writes a trigger description into the file
descriptor that defines the stall state - some or full, and the maximum
stall time over a given window of time.  E.g.:

        /* Signal when stall time exceeds 100ms of a 1s window */
        char trigger[] = "full 100000 1000000";
        fd = open("/proc/pressure/memory");
        write(fd, trigger, sizeof(trigger));
        while (poll() >= 0) {
                ...
        }
        close(fd);

When the monitored stall state is entered, psi adapts its aggregation
frequency according to what the configured time window requires in order
to emit event signals in a timely fashion.  Once the stalling subsides,
aggregation reverts back to normal.

The trigger is associated with the open file descriptor.  To stop
monitoring, the user only needs to close the file descriptor and the
trigger is discarded.

Patches 1-4 prepare the psi code for polling support.  Patch 5 implements
the adaptive polling logic, the pressure growth detection optimized for
short intervals, and hooks up write() and poll() on the pressure files.

The patches were developed in collaboration with Johannes Weiner.

This patch (of 5):

Kernfs has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes.  To allow polling for
custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

(cherry picked from commit: 147e1a97c4a0bdd43f55a582a9416bb9092563a9)

Conflicts:
        fs/kernfs/file.c
        include/linux/kernfs.h

1. replaced __poll_t with unsigned int.
2. replaced kernfs_dentry_node() with dentry->d_fsdata
3. replaced EPOLLERR/EPOLLPRI with POLLERR/POLLPRI (values are the same)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: Ic2bed334d05aec62f4e695f263893c3057921c55
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:49 +01:00
Tejun Heo
509c0be293 UPSTREAM: kernfs: add kernfs_ops->open/release() callbacks
Add ->open/release() methods to kernfs_ops.  ->open() is called when
the file is opened and ->release() when the file is either released or
severed.  These callbacks can be used, for example, to manage
persistent caching objects over multiple seq_file iterations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>

(cherry picked from commit 0e67db2f9fe91937e798e3d7d22c50a8438187e1)

Bug: 111308141
Test: modified lmkd to use PSI and tested using lmkd_unit_test

Change-Id: Id06e9d5c6da1280bcdd4dc86309dcfaf52b8f9a4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:49 +01:00
Aditya Kali
8ecfcb612c kernfs: Add API to generate relative kernfs path
The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:49 +01:00
Andrey Vagin
293b50a101 kernel: add a helper to get an owning user namespace for a namespace
Return -EPERM if an owning user namespace is outside of a process
current user namespace.

v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
    This special cases was removed from this version. There is nothing
    outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.

Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:49 +01:00
Serge E. Hallyn
215cbf575a cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
Patch summary:

When showing a cgroupfs entry in mountinfo, show the path of the mount
root dentry relative to the reader's cgroup namespace root.

Short explanation (courtesy of mkerrisk):

If we create a new cgroup namespace, then we want both /proc/self/cgroup
and /proc/self/mountinfo to show cgroup paths that are correctly
virtualized with respect to the cgroup mount point.  Previous to this
patch, /proc/self/cgroup shows the right info, but /proc/self/mountinfo
does not.

Long version:

When a uid 0 task which is in freezer cgroup /a/b, unshares a new cgroup
namespace, and then mounts a new instance of the freezer cgroup, the new
mount will be rooted at /a/b.  The root dentry field of the mountinfo
entry will show '/a/b'.

 cat > /tmp/do1 << EOF
 mount -t cgroup -o freezer freezer /mnt
 grep freezer /proc/self/mountinfo
 EOF

 unshare -Gm  bash /tmp/do1
 > 330 160 0:34 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer
 > 355 133 0:34 /a/b /mnt rw,relatime - cgroup freezer rw,freezer

The task's freezer cgroup entry in /proc/self/cgroup will simply show
'/':

 grep freezer /proc/self/cgroup
 9:freezer:/

If instead the same task simply bind mounts the /a/b cgroup directory,
the resulting mountinfo entry will again show /a/b for the dentry root.
However in this case the task will find its own cgroup at /mnt/a/b,
not at /mnt:

 mount --bind /sys/fs/cgroup/freezer/a/b /mnt
 130 25 0:34 /a/b /mnt rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,freezer

In other words, there is no way for the task to know, based on what is
in mountinfo, which cgroup directory is its own.

Example (by mkerrisk):

First, a little script to save some typing and verbiage:

echo -e "\t/proc/self/cgroup:\t$(cat /proc/self/cgroup | grep freezer)"
cat /proc/self/mountinfo | grep freezer |
        awk '{print "\tmountinfo:\t\t" $4 "\t" $5}'

Create cgroup, place this shell into the cgroup, and look at the state
of the /proc files:

2653
2653                         # Our shell
14254                        # cat(1)
        /proc/self/cgroup:      10:freezer:/a/b
        mountinfo:              /       /sys/fs/cgroup/freezer

Create a shell in new cgroup and mount namespaces. The act of creating
a new cgroup namespace causes the process's current cgroups directories
to become its cgroup root directories. (Here, I'm using my own version
of the "unshare" utility, which takes the same options as the util-linux
version):

Look at the state of the /proc files:

        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /       /sys/fs/cgroup/freezer

The third entry in /proc/self/cgroup (the pathname of the cgroup inside
the hierarchy) is correctly virtualized w.r.t. the cgroup namespace, which
is rooted at /a/b in the outer namespace.

However, the info in /proc/self/mountinfo is not for this cgroup
namespace, since we are seeing a duplicate of the mount from the
old mount namespace, and the info there does not correspond to the
new cgroup namespace. However, trying to create a new mount still
doesn't show us the right information in mountinfo:

                                      # propagating to other mountns
        /proc/self/cgroup:      7:freezer:/
        mountinfo:              /a/b    /mnt/freezer

The act of creating a new cgroup namespace caused the process's
current freezer directory, "/a/b", to become its cgroup freezer root
directory. In other words, the pathname directory of the directory
within the newly mounted cgroup filesystem should be "/",
but mountinfo wrongly shows us "/a/b". The consequence of this is
that the process in the cgroup namespace cannot correctly construct
the pathname of its cgroup root directory from the information in
/proc/PID/mountinfo.

With this patch, the dentry root field in mountinfo is shown relative
to the reader's cgroup namespace.  So the same steps as above:

        /proc/self/cgroup:      10:freezer:/a/b
        mountinfo:              /       /sys/fs/cgroup/freezer
        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /../..  /sys/fs/cgroup/freezer
        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /       /mnt/freezer

cgroup.clone_children  freezer.parent_freezing  freezer.state      tasks
cgroup.procs           freezer.self_freezing    notify_on_release
3164
2653                   # First shell that placed in this cgroup
3164                   # Shell started by 'unshare'
14197                  # cat(1)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
2022-03-04 20:16:47 +01:00
Aditya Kali
66bbf8312d cgroup: introduce cgroup namespaces
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
 they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chatur27 <jasonbright2709@gmail.com>
Change-Id: Ifd2df9f562baa90b0fe7c986f86967602657c640
2022-03-04 20:16:44 +01:00
Michael Bestas
438071e031 Merge remote-tracking branch 'common/android-4.4-p' into android-msm-wahoo-4.4
* common/android-4.4-p:
  Linux 4.4.302
  Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list"
  KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents"
  Revert "tc358743: fix register i2c_rd/wr function fix"
  Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)"
  Bluetooth: MGMT: Fix misplaced BT_HS check
  ipv4: tcp: send zero IPID in SYNACK messages
  ipv4: raw: lock the socket in raw_bind()
  hwmon: (lm90) Reduce maximum conversion rate for G781
  drm/msm: Fix wrong size calculation
  net-procfs: show net devices bound packet types
  ipv4: avoid using shared IP generator for connected sockets
  net: fix information leakage in /proc/net/ptype
  ipv6_tunnel: Rate limit warning messages
  scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
  USB: core: Fix hang in usb_kill_urb by adding memory barriers
  usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
  tty: Add support for Brainboxes UC cards.
  tty: n_gsm: fix SW flow control encoding/handling
  serial: stm32: fix software flow control transfer
  PM: wakeup: simplify the output logic of pm_show_wakelocks()
  udf: Fix NULL ptr deref when converting from inline format
  udf: Restore i_lenAlloc when inode expansion fails
  scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
  s390/hypfs: include z/VM guests with access control group set
  Bluetooth: refactor malicious adv data check
  can: bcm: fix UAF of bcm op
  Linux 4.4.301
  drm/i915: Flush TLBs before releasing backing store
  Linux 4.4.300
  lib82596: Fix IRQ check in sni_82596_probe
  bcmgenet: add WOL IRQ check
  net_sched: restore "mpu xxx" handling
  dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
  dmaengine: at_xdmac: Fix lld view setting
  dmaengine: at_xdmac: Print debug message after realeasing the lock
  dmaengine: at_xdmac: Don't start transactions at tx_submit level
  netns: add schedule point in ops_exit_list()
  net: axienet: fix number of TX ring slots for available check
  net: axienet: Wait for PhyRstCmplt after core reset
  af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress
  parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries
  net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
  powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
  ext4: don't use the orphan list when migrating an inode
  ext4: Fix BUG_ON in ext4_bread when write quota data
  ext4: set csum seed in tmp inode while migrating to extents
  ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
  power: bq25890: Enable continuous conversion for ADC at charging
  scsi: sr: Don't use GFP_DMA
  MIPS: Octeon: Fix build errors using clang
  i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters
  ALSA: seq: Set upper limit of processed events
  w1: Misuse of get_user()/put_user() reported by sparse
  i2c: mpc: Correct I2C reset procedure
  powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
  i2c: i801: Don't silently correct invalid transfer size
  powerpc/btext: add missing of_node_put
  powerpc/cell: add missing of_node_put
  powerpc/powernv: add missing of_node_put
  powerpc/6xx: add missing of_node_put
  parisc: Avoid calling faulthandler_disabled() twice
  serial: core: Keep mctrl register state and cached copy in sync
  serial: pl010: Drop CR register reset on set_termios
  dm space map common: add bounds check to sm_ll_lookup_bitmap()
  dm btree: add a defensive bounds check to insert_at()
  net: mdio: Demote probed message to debug print
  btrfs: remove BUG_ON(!eie) in find_parent_nodes
  btrfs: remove BUG_ON() in find_parent_nodes()
  ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
  ACPICA: Utilities: Avoid deleting the same object twice in a row
  um: registers: Rename function names to avoid conflicts and build problems
  ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream
  usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0
  media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach()
  media: igorplugusb: receiver overflow should be reported
  net: bonding: debug: avoid printing debug logs when bond is not notifying peers
  iwlwifi: mvm: synchronize with FW after multicast commands
  media: m920x: don't use stack on USB reads
  media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach()
  floppy: Add max size check for user space request
  mwifiex: Fix skb_over_panic in mwifiex_usb_recv()
  HSI: core: Fix return freed object in hsi_new_client
  media: b2c2: Add missing check in flexcop_pci_isr:
  usb: gadget: f_fs: Use stream_open() for endpoint files
  ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply
  fs: dlm: filter user dlm messages for kernel locks
  Bluetooth: Fix debugfs entry leak in hci_register_dev()
  RDMA/cxgb4: Set queue pair state when being queried
  mips: bcm63xx: add support for clk_set_parent()
  mips: lantiq: add support for clk_set_parent()
  misc: lattice-ecp3-config: Fix task hung when firmware load failed
  ASoC: samsung: idma: Check of ioremap return value
  dmaengine: pxa/mmp: stop referencing config->slave_id
  RDMA/core: Let ib_find_gid() continue search even after empty entry
  char/mwave: Adjust io port register size
  ALSA: oss: fix compile error when OSS_DEBUG is enabled
  powerpc/prom_init: Fix improper check of prom_getprop()
  ALSA: hda: Add missing rwsem around snd_ctl_remove() calls
  ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls
  ALSA: jack: Add missing rwsem around snd_ctl_remove() calls
  ext4: avoid trim error on fs with small groups
  net: mcs7830: handle usb read errors properly
  pcmcia: fix setting of kthread task states
  can: xilinx_can: xcan_probe(): check for error irq
  can: softing: softing_startstop(): fix set but not used variable warning
  spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
  ppp: ensure minimum packet size in ppp_write()
  pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region()
  pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region()
  usb: ftdi-elan: fix memory leak on device disconnect
  media: msi001: fix possible null-ptr-deref in msi001_probe()
  media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach()
  media: dib8000: Fix a memleak in dib8000_init()
  floppy: Fix hang in watchdog when disk is ejected
  serial: amba-pl011: do not request memory region twice
  drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
  arm64: dts: qcom: msm8916: fix MMC controller aliases
  netfilter: bridge: add support for pppoe filtering
  tty: serial: atmel: Call dma_async_issue_pending()
  tty: serial: atmel: Check return code of dmaengine_submit()
  crypto: qce - fix uaf on qce_ahash_register_one
  Bluetooth: stop proccessing malicious adv data
  Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails
  PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
  can: softing_cs: softingcs_probe(): fix memleak on registration failure
  media: stk1160: fix control-message timeouts
  media: pvrusb2: fix control-message timeouts
  media: dib0700: fix undefined behavior in tuner shutdown
  media: em28xx: fix control-message timeouts
  media: mceusb: fix control-message timeouts
  rtc: cmos: take rtc_lock while reading from CMOS
  nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
  HID: uhid: Fix worker destroying device without any protection
  rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled
  media: uvcvideo: fix division by zero at stream start
  drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk()
  can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved}
  can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data
  mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe()
  USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status
  USB: core: Fix bug in resuming hub's handling of wakeup requests
  Bluetooth: bfusb: fix division by zero in send path
  Linux 4.4.299
  power: reset: ltc2952: Fix use of floating point literals
  mISDN: change function names to avoid conflicts
  net: udp: fix alignment problem in udp4_seq_show()
  ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
  scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
  phonet: refcount leak in pep_sock_accep
  rndis_host: support Hytera digital radios
  xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
  sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
  i40e: Fix incorrect netdev's real number of RX/TX queues
  mac80211: initialize variable have_higher_than_11mbit
  ieee802154: atusb: fix uninit value in atusb_set_extended_addr
  Bluetooth: btusb: Apply QCA Rome patches for some ATH3012 models
  bpf, test: fix ld_abs + vlan push/pop stress test
  Linux 4.4.298
  net: fix use-after-free in tw_timer_handler
  Input: spaceball - fix parsing of movement data packets
  Input: appletouch - initialize work before device registration
  scsi: vmw_pvscsi: Set residual data length conditionally
  usb: gadget: f_fs: Clear ffs_eventfd in ffs_data_clear.
  xhci: Fresco FL1100 controller should not have BROKEN_MSI quirk set.
  uapi: fix linux/nfc.h userspace compilation errors
  nfc: uapi: use kernel size_t to fix user-space builds
  selinux: initialize proto variable in selinux_ip_postroute_compat()
  recordmcount.pl: fix typo in s390 mcount regex
  platform/x86: apple-gmux: use resource_size() with res
  Linux 4.4.297
  phonet/pep: refuse to enable an unbound pipe
  hamradio: improve the incomplete fix to avoid NPD
  hamradio: defer ax25 kfree after unregister_netdev
  ax25: NPD bug when detaching AX25 device
  xen/blkfront: fix bug in backported patch
  ARM: 9169/1: entry: fix Thumb2 bug in iWMMXt exception handling
  ALSA: drivers: opl3: Fix incorrect use of vp->state
  ALSA: jack: Check the return value of kstrdup()
  hwmon: (lm90) Fix usage of CONFIG2 register in detect function
  drivers: net: smc911x: Check for error irq
  bonding: fix ad_actor_system option setting to default
  qlcnic: potential dereference null pointer of rx_queue->page_ring
  IB/qib: Fix memory leak in qib_user_sdma_queue_pkts()
  HID: holtek: fix mouse probing
  can: kvaser_usb: get CAN clock frequency from device
  net: usb: lan78xx: add Allied Telesis AT29M2-AF

 Conflicts:
	drivers/usb/gadget/function/f_fs.c

Change-Id: I54140777477cbab1b4c6b7d77558e92ca2b30e96
2022-02-09 19:41:39 +02:00
Greg Kroah-Hartman
875c0cc811 Merge 4.4.302 into android-4.4-p
Changes in 4.4.302
	can: bcm: fix UAF of bcm op
	Bluetooth: refactor malicious adv data check
	s390/hypfs: include z/VM guests with access control group set
	scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
	udf: Restore i_lenAlloc when inode expansion fails
	udf: Fix NULL ptr deref when converting from inline format
	PM: wakeup: simplify the output logic of pm_show_wakelocks()
	serial: stm32: fix software flow control transfer
	tty: n_gsm: fix SW flow control encoding/handling
	tty: Add support for Brainboxes UC cards.
	usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
	USB: core: Fix hang in usb_kill_urb by adding memory barriers
	scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
	ipv6_tunnel: Rate limit warning messages
	net: fix information leakage in /proc/net/ptype
	ipv4: avoid using shared IP generator for connected sockets
	net-procfs: show net devices bound packet types
	drm/msm: Fix wrong size calculation
	hwmon: (lm90) Reduce maximum conversion rate for G781
	ipv4: raw: lock the socket in raw_bind()
	ipv4: tcp: send zero IPID in SYNACK messages
	Bluetooth: MGMT: Fix misplaced BT_HS check
	Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)"
	Revert "tc358743: fix register i2c_rd/wr function fix"
	KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents"
	Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list"
	Linux 4.4.302

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5191d3cb4df0fa8de60170d2fedf4a3c51380fdf
2022-02-03 10:00:04 +01:00
Jan Kara
0f28e1a57b udf: Fix NULL ptr deref when converting from inline format
commit 7fc3b7c2981bbd1047916ade327beccb90994eee upstream.

udf_expand_file_adinicb() calls directly ->writepage to write data
expanded into a page. This however misses to setup inode for writeback
properly and so we can crash on inode->i_wb dereference when submitting
page for IO like:

  BUG: kernel NULL pointer dereference, address: 0000000000000158
  #PF: supervisor read access in kernel mode
...
  <TASK>
  __folio_start_writeback+0x2ac/0x350
  __block_write_full_page+0x37d/0x490
  udf_expand_file_adinicb+0x255/0x400 [udf]
  udf_file_write_iter+0xbe/0x1b0 [udf]
  new_sync_write+0x125/0x1c0
  vfs_write+0x28e/0x400

Fix the problem by marking the page dirty and going through the standard
writeback path to write the page. Strictly speaking we would not even
have to write the page but we want to catch e.g. ENOSPC errors early.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 52ebea749a ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-03 09:27:52 +01:00
Jan Kara
f25e032aa6 udf: Restore i_lenAlloc when inode expansion fails
commit ea8569194b43f0f01f0a84c689388542c7254a1f upstream.

When we fail to expand inode from inline format to a normal format, we
restore inode to contain the original inline formatting but we forgot to
set i_lenAlloc back. The mismatch between i_lenAlloc and i_size was then
causing further problems such as warnings and lost data down the line.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 7e49b6f248 ("udf: Convert UDF to new truncate calling sequence")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-03 09:27:52 +01:00
Greg Kroah-Hartman
22784fcab2 Merge 4.4.300 into android-4.4-p
Changes in 4.4.300
	Bluetooth: bfusb: fix division by zero in send path
	USB: core: Fix bug in resuming hub's handling of wakeup requests
	USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status
	mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe()
	can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data
	can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved}
	drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk()
	media: uvcvideo: fix division by zero at stream start
	rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled
	HID: uhid: Fix worker destroying device without any protection
	nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
	rtc: cmos: take rtc_lock while reading from CMOS
	media: mceusb: fix control-message timeouts
	media: em28xx: fix control-message timeouts
	media: dib0700: fix undefined behavior in tuner shutdown
	media: pvrusb2: fix control-message timeouts
	media: stk1160: fix control-message timeouts
	can: softing_cs: softingcs_probe(): fix memleak on registration failure
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9125 SATA controller
	Bluetooth: cmtp: fix possible panic when cmtp_init_sockets() fails
	Bluetooth: stop proccessing malicious adv data
	crypto: qce - fix uaf on qce_ahash_register_one
	tty: serial: atmel: Check return code of dmaengine_submit()
	tty: serial: atmel: Call dma_async_issue_pending()
	netfilter: bridge: add support for pppoe filtering
	arm64: dts: qcom: msm8916: fix MMC controller aliases
	drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode()
	serial: amba-pl011: do not request memory region twice
	floppy: Fix hang in watchdog when disk is ejected
	media: dib8000: Fix a memleak in dib8000_init()
	media: saa7146: mxb: Fix a NULL pointer dereference in mxb_attach()
	media: msi001: fix possible null-ptr-deref in msi001_probe()
	usb: ftdi-elan: fix memory leak on device disconnect
	pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in __nonstatic_find_io_region()
	pcmcia: rsrc_nonstatic: Fix a NULL pointer dereference in nonstatic_find_mem_region()
	ppp: ensure minimum packet size in ppp_write()
	spi: spi-meson-spifc: Add missing pm_runtime_disable() in meson_spifc_probe
	can: softing: softing_startstop(): fix set but not used variable warning
	can: xilinx_can: xcan_probe(): check for error irq
	pcmcia: fix setting of kthread task states
	net: mcs7830: handle usb read errors properly
	ext4: avoid trim error on fs with small groups
	ALSA: jack: Add missing rwsem around snd_ctl_remove() calls
	ALSA: PCM: Add missing rwsem around snd_ctl_remove() calls
	ALSA: hda: Add missing rwsem around snd_ctl_remove() calls
	powerpc/prom_init: Fix improper check of prom_getprop()
	ALSA: oss: fix compile error when OSS_DEBUG is enabled
	char/mwave: Adjust io port register size
	RDMA/core: Let ib_find_gid() continue search even after empty entry
	dmaengine: pxa/mmp: stop referencing config->slave_id
	ASoC: samsung: idma: Check of ioremap return value
	misc: lattice-ecp3-config: Fix task hung when firmware load failed
	mips: lantiq: add support for clk_set_parent()
	mips: bcm63xx: add support for clk_set_parent()
	RDMA/cxgb4: Set queue pair state when being queried
	Bluetooth: Fix debugfs entry leak in hci_register_dev()
	fs: dlm: filter user dlm messages for kernel locks
	ar5523: Fix null-ptr-deref with unexpected WDCMSG_TARGET_START reply
	usb: gadget: f_fs: Use stream_open() for endpoint files
	media: b2c2: Add missing check in flexcop_pci_isr:
	HSI: core: Fix return freed object in hsi_new_client
	mwifiex: Fix skb_over_panic in mwifiex_usb_recv()
	floppy: Add max size check for user space request
	media: saa7146: hexium_orion: Fix a NULL pointer dereference in hexium_attach()
	media: m920x: don't use stack on USB reads
	iwlwifi: mvm: synchronize with FW after multicast commands
	net: bonding: debug: avoid printing debug logs when bond is not notifying peers
	media: igorplugusb: receiver overflow should be reported
	media: saa7146: hexium_gemini: Fix a NULL pointer dereference in hexium_attach()
	usb: hub: Add delay for SuperSpeed hub resume to let links transit to U0
	ath9k: Fix out-of-bound memcpy in ath9k_hif_usb_rx_stream
	um: registers: Rename function names to avoid conflicts and build problems
	ACPICA: Utilities: Avoid deleting the same object twice in a row
	ACPICA: Executer: Fix the REFCLASS_REFOF case in acpi_ex_opcode_1A_0T_1R()
	btrfs: remove BUG_ON() in find_parent_nodes()
	btrfs: remove BUG_ON(!eie) in find_parent_nodes
	net: mdio: Demote probed message to debug print
	dm btree: add a defensive bounds check to insert_at()
	dm space map common: add bounds check to sm_ll_lookup_bitmap()
	serial: pl010: Drop CR register reset on set_termios
	serial: core: Keep mctrl register state and cached copy in sync
	parisc: Avoid calling faulthandler_disabled() twice
	powerpc/6xx: add missing of_node_put
	powerpc/powernv: add missing of_node_put
	powerpc/cell: add missing of_node_put
	powerpc/btext: add missing of_node_put
	i2c: i801: Don't silently correct invalid transfer size
	powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
	i2c: mpc: Correct I2C reset procedure
	w1: Misuse of get_user()/put_user() reported by sparse
	ALSA: seq: Set upper limit of processed events
	i2c: designware-pci: Fix to change data types of hcnt and lcnt parameters
	MIPS: Octeon: Fix build errors using clang
	scsi: sr: Don't use GFP_DMA
	power: bq25890: Enable continuous conversion for ADC at charging
	ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
	ext4: set csum seed in tmp inode while migrating to extents
	ext4: Fix BUG_ON in ext4_bread when write quota data
	ext4: don't use the orphan list when migrating an inode
	powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
	net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
	parisc: pdc_stable: Fix memory leak in pdcs_register_pathentries
	af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress
	net: axienet: Wait for PhyRstCmplt after core reset
	net: axienet: fix number of TX ring slots for available check
	netns: add schedule point in ops_exit_list()
	dmaengine: at_xdmac: Don't start transactions at tx_submit level
	dmaengine: at_xdmac: Print debug message after realeasing the lock
	dmaengine: at_xdmac: Fix lld view setting
	dmaengine: at_xdmac: Fix at_xdmac_lld struct definition
	net_sched: restore "mpu xxx" handling
	bcmgenet: add WOL IRQ check
	lib82596: Fix IRQ check in sni_82596_probe
	Linux 4.4.300

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ic6c59dd0f4ed703fff49584b3774d39e4548af4a
2022-01-27 08:54:26 +01:00