IMA does sometimes access the inode's i_uid and compares it against the
rules' fowner. Enable IMA to handle idmapped mounts by passing down the
mount's user namespace. We simply make use of the helpers we introduced
before. If the initial user namespace is passed nothing changes so
non-idmapped mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-27-christian.brauner@ubuntu.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The i_uid and i_gid are mostly used when logging for AppArmor. This is
broken in a bunch of places where the global root id is reported instead
of the i_uid or i_gid of the file. Nonetheless, be kind and log the
mapped inode if we're coming from an idmapped mount. If the initial user
namespace is passed nothing changes so non-idmapped mounts will see
identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-26-christian.brauner@ubuntu.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.
As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.
Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.
In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.
In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.
If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.
Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.
The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.
In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the ->set() method to take an additional user namespace argument to pass
the mount's user namespace down.
Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.
Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
In order to determine whether a caller holds privilege over a given
inode the capability framework exposes the two helpers
privileged_wrt_inode_uidgid() and capable_wrt_inode_uidgid(). The former
verifies that the inode has a mapping in the caller's user namespace and
the latter additionally verifies that the caller has the requested
capability in their current user namespace.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped inodes. If the initial user namespace is passed all
operations are a nop so non-idmapped mounts will not see a change in
behavior.
Link: https://lore.kernel.org/r/20210121131959.646623-5-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
KEY_FLAG_KEEP is not meant to be passed to keyring_alloc() or key_alloc(),
as these only take KEY_ALLOC_* flags. KEY_FLAG_KEEP has the same value as
KEY_ALLOC_BYPASS_RESTRICTION, but fortunately only key_create_or_update()
uses it. LSMs using the key_alloc hook don't check that flag.
KEY_FLAG_KEEP is then ignored but fortunately (again) the root user cannot
write to the blacklist keyring, so it is not possible to remove a key/hash
from it.
Fix this by adding a KEY_ALLOC_SET_KEEP flag that tells key_alloc() to set
KEY_FLAG_KEEP on the new key. blacklist_init() can then, correctly, pass
this to keyring_alloc().
We can also use this in ima_mok_init() rather than setting the flag
manually.
Note that this doesn't fix an observable bug with the current
implementation but it is required to allow addition of new hashes to the
blacklist in the future without making it possible for them to be removed.
Fixes: 734114f878 ("KEYS: Add a system blacklist keyring")
Reported-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mickaël Salaün <mic@linux.microsoft.com>
cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Reviewing use of memset in keyctl_pkey.c
keyctl_pkey_params_get prologue code to set params up
memset(params, 0, sizeof(*params));
params->encoding = "raw";
keyctl_pkey_query has the same prologue
and calls keyctl_pkey_params_get.
So remove the prologue.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
The merged API doesn't use a watch_queue device, but instead relies on
pipes, so let the documentation reflect that.
Fixes: f7e47677e3 ("watch_queue: Add a key/keyring notification facility")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
When the semantics of the ->read() handlers were changed such that "buffer"
is a kernel pointer, some __user annotations survived.
Since they're wrong now, get rid of them.
Fixes: d3ec10aa95 ("KEYS: Don't write out to userspace while holding key semaphore")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Ben Boeckel <mathstuf@gmail.com>
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us. And it's possible for
old long name to be freed after rename, leading to UAF here.
Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
SELinux stores the active policy in memory, so the changes to this data
at runtime would have an impact on the security guarantees provided
by SELinux. Measuring in-memory SELinux policy through IMA subsystem
provides a secure way for the attestation service to remotely validate
the policy contents at runtime.
Measure the hash of the loaded policy by calling the IMA hook
ima_measure_critical_data(). Since the size of the loaded policy
can be large (several MB), measure the hash of the policy instead of
the entire policy to avoid bloating the IMA log entry.
To enable SELinux data measurement, the following steps are required:
1, Add "ima_policy=critical_data" to the kernel command line arguments
to enable measuring SELinux data at boot time.
For example,
BOOT_IMAGE=/boot/vmlinuz-5.10.0-rc1+ root=UUID=fd643309-a5d2-4ed3-b10d-3c579a5fab2f ro nomodeset security=selinux ima_policy=critical_data
2, Add the following rule to /etc/ima/ima-policy
measure func=CRITICAL_DATA label=selinux
Sample measurement of the hash of SELinux policy:
To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.
sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1
grep "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | tail -1 | cut -d' ' -f 6
Note that the actual verification of SELinux policy would require loading
the expected policy into an identical kernel on a pristine/known-safe
system and run the sha256sum /sys/kernel/selinux/policy there to get
the expected hash.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Define a new critical data builtin policy to allow measuring
early kernel integrity critical data before a custom IMA policy
is loaded.
Update the documentation on kernel parameters to document
the new critical data builtin policy.
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
The IMA hook ima_measure_critical_data() does not support a way to
specify the source of the critical data provider. Thus, the data
measurement cannot be constrained based on the data source label
in the IMA policy.
Extend the IMA hook ima_measure_critical_data() to support passing
the data source label as an input parameter, so that the policy rule can
be used to limit the measurements based on the label.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Integrity critical data may belong to a single subsystem or it may
arise from cross subsystem interaction. Currently there is no mechanism
to group or limit the data based on certain label. Limiting and
grouping critical data based on a label would make it flexible and
configurable to measure.
Define "label:=", a new IMA policy condition, for the IMA func
CRITICAL_DATA to allow grouping and limiting measurement of integrity
critical data.
Limit the measurement to the labels that are specified in the IMA
policy - CRITICAL_DATA+"label:=". If "label:=" is not provided with
the func CRITICAL_DATA, measure all the input integrity critical data.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
A new IMA policy rule is needed for the IMA hook
ima_measure_critical_data() and the corresponding func CRITICAL_DATA for
measuring the input buffer. The policy rule should ensure the buffer
would get measured only when the policy rule allows the action. The
policy rule should also support the necessary constraints (flags etc.)
for integrity critical buffer data measurements.
Add policy rule support for measuring integrity critical data.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
IMA provides capabilities to measure file and buffer data. However,
various data structures, policies, and states stored in kernel memory
also impact the integrity of the system. Several kernel subsystems
contain such integrity critical data. These kernel subsystems help
protect the integrity of the system. Currently, IMA does not provide a
generic function for measuring kernel integrity critical data.
Define ima_measure_critical_data, a new IMA hook, to measure kernel
integrity critical data.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
The original IMA buffer data measurement sizes were small (e.g. boot
command line), but the new buffer data measurement use cases have data
sizes that are a lot larger. Just as IMA measures the file data hash,
not the file data, IMA should similarly support the option for measuring
buffer data hash.
Introduce a boolean parameter to support measuring buffer data hash,
which would be much smaller, instead of the buffer itself.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
IMA functions such as ima_match_keyring(), process_buffer_measurement(),
ima_match_policy() etc. handle data specific to keyrings. Currently,
these constructs are not generic to handle any func specific data.
This makes it harder to extend them without code duplication.
Refactor the keyring specific measurement constructs to be generic and
reusable in other measurement scenarios.
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
This change adds a new LSM hook, inode_init_security_anon(), that will
be used while creating secure anonymous inodes. The hook allows/denies
its creation and assigns a security context to the inode.
The new hook accepts an optional context_inode parameter that callers
can use to provide additional contextual information to security modules
for granting/denying permission to create an anon-inode of the same type.
This context_inode's security_context can also be used to initialize the
newly created anon-inode's security_context.
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When a superblock is assigned the SECURITY_FS_USE_XATTR behavior by the
policy yet it lacks xattr support, try to fall back to genfs rather than
rejecting the mount. If a genfscon rule is found for the filesystem,
then change the behavior to SECURITY_FS_USE_GENFS, otherwise reject the
mount as before. A similar fallback is already done in security_fs_use()
if no behavior specification is found for the given filesystem.
This is needed e.g. for virtiofs, which may or may not support xattrs
depending on the backing host filesystem.
Example:
# seinfo --genfs | grep ' ramfs'
genfscon ramfs / system_u:object_r:ramfs_t:s0
# echo '(fsuse xattr ramfs (system_u object_r fs_t ((s0) (s0))))' >ramfs_xattr.cil
# semodule -i ramfs_xattr.cil
# mount -t ramfs none /mnt
Before:
mount: /mnt: mount(2) system call failed: Operation not supported.
After:
(mount succeeds)
# ls -Zd /mnt
system_u:object_r:ramfs_t:s0 /mnt
See also:
https://lore.kernel.org/selinux/20210105142148.GA3200@redhat.com/T/https://github.com/fedora-selinux/selinux-policy/pull/478
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
tmp_tfm is allocated, but not freed on subsequent kmalloc failure, which
leads to a memory leak. Free tmp_tfm.
Fixes: d46eb36995 ("evm: crypto hash replaced by shash")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
[zohar@linux.ibm.com: formatted/reworded patch description]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Steps on the way to 5.11-rc1
Resolves merge issues in:
kernel/sched/sched.h
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: If5b00273202785ed39c5f46dfb92dd7791499cda
This is motivated by a perfomance regression of selinux_xfrm_enabled()
that happened on a RHEL kernel due to false sharing between
selinux_xfrm_refcount and (the late) selinux_ss.policy_rwlock (i.e. the
.bss section memory layout changed such that they happened to share the
same cacheline). Since the policy rwlock's memory region was modified
upon each read-side critical section, the readers of
selinux_xfrm_refcount had frequent cache misses, eventually leading to a
significant performance degradation under a TCP SYN flood on a system
with many cores (32 in this case, but it's detectable on less cores as
well).
While upstream has since switched to RCU locking, so the same can no
longer happen here, selinux_xfrm_refcount could still share a cacheline
with another frequently written region, thus marking it __read_mostly
still makes sense. __read_mostly helps, because it will put the symbol
in a separate section along with other read-mostly variables, so there
should never be a clash with frequently written data.
Since selinux_xfrm_refcount is modified only in case of an explicit
action, it should be safe to do this (i.e. it shouldn't disrupt other
read-mostly variables too much).
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
All of these are never modified outside initcalls, so they can be
__ro_after_init.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
It is not referenced outside selinuxfs.c, so remove its extern header
declaration and make it static.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Its value is actually not changed anywhere, so it can be substituted for
a direct call to audit_update_lsm_rules().
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
All of sel_ib_pkey_list, sel_netif_list, sel_netnode_list, and
sel_netport_list are declared but never used. Remove them.
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.
When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr. This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.
This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().
Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.
Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Fixes: c8e222616c ("selinux: allow reading labels before policy is loaded")
Cc: stable@vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The MPTCP protocol uses a specific protocol value, even if
it's an extension to TCP. Additionally, MPTCP sockets
could 'fall-back' to TCP at run-time, depending on peer MPTCP
support and available resources.
As a consequence of the specific protocol number, selinux
applies the raw_socket class to MPTCP sockets.
Existing TCP application converted to MPTCP - or forced to
use MPTCP socket with user-space hacks - will need an
updated policy to run successfully.
This change lets selinux attach the TCP socket class to
MPTCP sockets, too, so that no policy changes are needed in
the above scenario.
Note that the MPTCP is setting, propagating and updating the
security context on all the subflows and related request
socket.
Link: https://lore.kernel.org/linux-security-module/CAHC9VhTaK3xx0hEGByD2zxfF7fadyPP1kb-WeWH_YCyq9X-sRg@mail.gmail.com/T/#t
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[PM: tweaked subject's prefix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
This reverts commit f1ee68a8f6.
Reason for revert:
The change being reverted adds a new "lockdown" audit class. Support
for this new class needs to be added to Android and the processes
which need to be part of this class have to be annotated. While support
for this class has not yet been added to Android, this lockdown class
will be removed.
Tracefs usage by Android triggers a violation with respect to this new
audit class which prompted the need for this patch.
Bug: 148822198
Change-Id: Ie06f4be699234fb671ec4bcfe11962b2055a0c60
Signed-off-by: Ram Muthiah <rammuthiah@google.com>
The v3 file capabilities have a uid field that records the filesystem
uid of the root user of the user namespace the file capabilities are
valid in.
When someone is silly enough to have the same underlying uid as the
root uid of multiple nested containers a v3 filesystem capability can
be ambiguous.
In the spirit of don't do that then, forbid writing a v3 filesystem
capability if it is ambiguous.
Fixes: 8db6c34f1d ("Introduce v3 namespaced file capabilities")
Reviewed-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Pull smack fix from Casey Schaufler:
"Provide a fix for the incorrect handling of privilege in the face of
io_uring's use of kernel threads. That invalidated an long standing
assumption regarding the privilege of kernel threads.
The fix is simple and safe. It was provided by Jens Axboe and has been
tested"
* tag 'Smack-for-5.11-io_uring-fix' of git://github.com/cschaufler/smack-next:
Smack: Handle io_uring kernel thread privileges
Pull EFI updates from Borislav Petkov:
"These got delayed due to a last minute ia64 build issue which got
fixed in the meantime.
EFI updates collected by Ard Biesheuvel:
- Don't move BSS section around pointlessly in the x86 decompressor
- Refactor helper for discovering the EFI secure boot mode
- Wire up EFI secure boot to IMA for arm64
- Some fixes for the capsule loader
- Expose the RT_PROP table via the EFI test module
- Relax DT and kernel placement restrictions on ARM
with a few followup fixes:
- fix the build breakage on IA64 caused by recent capsule loader
changes
- suppress a type mismatch build warning in the expansion of
EFI_PHYS_ALIGN on ARM"
* tag 'efi_updates_for_v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: arm: force use of unsigned type for EFI_PHYS_ALIGN
efi: ia64: disable the capsule loader
efi: stub: get rid of efi_get_max_fdt_addr()
efi/efi_test: read RuntimeServicesSupported
efi: arm: reduce minimum alignment of uncompressed kernel
efi: capsule: clean scatter-gather entries from the D-cache
efi: capsule: use atomic kmap for transient sglist mappings
efi: x86/xen: switch to efi_get_secureboot_mode helper
arm64/ima: add ima_arch support
ima: generalize x86/EFI arch glue for other EFI architectures
efi: generalize efi_get_secureboot
efi/libstub: EFI_GENERIC_STUB_INITRD_CMDLINE_LOADER should not default to yes
efi/x86: Only copy the compressed kernel image in efi_relocate_kernel()
efi/libstub/x86: simplify efi_is_native()
Smack assumes that kernel threads are privileged for smackfs
operations. This was necessary because the credential of the
kernel thread was not related to a user operation. With io_uring
the credential does reflect a user's rights and can be used.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Steps on the way to 5.11-rc1
Resolves conflicts in:
arch/arm64/include/asm/thread_info.h
fs/proc/base.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibc53e7c14fc0953a95f89af81b0e0373ddad986a
Pull overlayfs updates from Miklos Szeredi:
- Allow unprivileged mounting in a user namespace.
For quite some time the security model of overlayfs has been that
operations on underlying layers shall be performed with the
privileges of the mounting task.
This way an unprvileged user cannot gain privileges by the act of
mounting an overlayfs instance. A full audit of all function calls
made by the overlayfs code has been performed to see whether they
conform to this model, and this branch contains some fixes in this
regard.
- Support running on copied filesystem images by optionally disabling
UUID verification.
- Bug fixes as well as documentation updates.
* tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: unprivieged mounts
ovl: do not get metacopy for userxattr
ovl: do not fail because of O_NOATIME
ovl: do not fail when setting origin xattr
ovl: user xattr
ovl: simplify file splice
ovl: make ioctl() safe
ovl: check privs before decoding file handle
vfs: verify source area in vfs_dedupe_file_range_one()
vfs: move cap_convert_nscap() call into vfs_setxattr()
ovl: fix incorrect extent info in metacopy case
ovl: expand warning in ovl_d_real()
ovl: document lower modification caveats
ovl: warn about orphan metacopy
ovl: doc clarification
ovl: introduce new "uuid=off" option for inodes index feature
ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh
This reverts commit d9cb255af3 as it
causes merge issues with upstream selinux changes, and really should be
fixed by now...
Bug: 148822198
Signed-off-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I30a54f07119d429bf95518844c8bd2e94606a2e0
Pull smack updates from Casey Schaufler:
"There are no functional changes. Just one minor code clean-up and a
set of corrections in function header comments"
* tag 'Smack-for-5.11' of git://github.com/cschaufler/smack-next:
security/smack: remove unused varible 'rc'
Smack: fix kernel-doc interface on functions