Changes in 5.15.26
mm/filemap: Fix handling of THPs in generic_file_buffered_read()
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
cgroup-v1: Correct privileges check in release_agent writes
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
btrfs: tree-checker: check item_size for inode_item
btrfs: tree-checker: check item_size for dev_item
clk: jz4725b: fix mmc0 clock gating
io_uring: don't convert to jiffies for waiting on timeouts
io_uring: disallow modification of rsrc_data during quiesce
selinux: fix misuse of mutex_is_locked()
vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
parisc/unaligned: Fix ldw() and stw() unalignment handlers
KVM: x86/mmu: make apf token non-zero to fix bug
drm/amd/display: Protect update_bw_bounding_box FPU code.
drm/amd/pm: fix some OEM SKU specific stability issues
drm/amd: Check if ASPM is enabled from PCIe subsystem
drm/amdgpu: disable MMHUB PG for Picasso
drm/amdgpu: do not enable asic reset for raven2
drm/i915: Widen the QGV point mask
drm/i915: Correctly populate use_sagv_wm for all pipes
drm/i915: Fix bw atomic check when switching between SAGV vs. no SAGV
sr9700: sanity check for packet length
USB: zaurus: support another broken Zaurus
CDC-NCM: avoid overflow in sanity checking
netfilter: xt_socket: fix a typo in socket_mt_destroy()
netfilter: xt_socket: missing ifdef CONFIG_IP6_NF_IPTABLES dependency
netfilter: nf_tables_offload: incorrect flow offload action array size
tee: export teedev_open() and teedev_close_context()
optee: use driver internal tee_context for some rpc
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
gpu: host1x: Always return syncpoint value when waiting
perf evlist: Fix failed to use cpu list for uncore events
perf data: Fix double free in perf_session__delete()
mptcp: fix race in incoming ADD_ADDR option processing
mptcp: add mibs counter for ignored incoming options
selftests: mptcp: fix diag instability
selftests: mptcp: be more conservative with cookie MPJ limits
bnx2x: fix driver load from initrd
bnxt_en: Fix active FEC reporting to ethtool
bnxt_en: Fix offline ethtool selftest with RDMA enabled
bnxt_en: Fix incorrect multicast rx mask setting when not requested
hwmon: Handle failure to register sensor with thermal zone correctly
net/mlx5: Fix tc max supported prio for nic mode
ice: check the return of ice_ptp_gettimex64
ice: initialize local variable 'tlv'
net/mlx5: Update the list of the PCI supported devices
bpf: Fix crash due to incorrect copy_map_value
bpf: Do not try bpf_msg_push_data with len 0
selftests: bpf: Check bpf_msg_push_data return value
bpf: Fix a bpf_timer initialization issue
bpf: Add schedule points in batch ops
io_uring: add a schedule point in io_add_buffers()
net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
tipc: Fix end of loop tests for list_for_each_entry()
gso: do not skip outer ip header in case of ipip and net_failover
net: mv643xx_eth: process retval from of_get_mac_address
openvswitch: Fix setting ipv6 fields causing hw csum failure
drm/edid: Always set RGB444
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
drm/vc4: crtc: Fix runtime_pm reference counting
drm/i915/dg2: Print PHY name properly on calibration error
net/sched: act_ct: Fix flow table lookup after ct clear or switching zones
net: ll_temac: check the return value of devm_kmalloc()
net: Force inlining of checksum functions in net/checksum.h
netfilter: nf_tables: unregister flowtable hooks on netns exit
nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac()
net: mdio-ipq4019: add delay after clock enable
netfilter: nf_tables: fix memory leak during stateful obj update
net/smc: Use a mutex for locking "struct smc_pnettable"
surface: surface3_power: Fix battery readings on batteries without a serial number
udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
net/mlx5: DR, Cache STE shadow memory
ibmvnic: schedule failover only if vioctl fails
net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
net/mlx5: Fix possible deadlock on rule deletion
net/mlx5: Fix wrong limitation of metadata match on ecpf
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5: Update log_max_qp value to be 17 at most
spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op()
gpio: rockchip: Reset int_bothedge when changing trigger
regmap-irq: Update interrupt clear register for proper reset
net-timestamp: convert sk->sk_tskey to atomic_t
RDMA/rtrs-clt: Fix possible double free in error case
RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
bnxt_en: Increase firmware message response DMA wait time
configfs: fix a race in configfs_{,un}register_subsystem()
RDMA/ib_srp: Fix a deadlock
tracing: Dump stacktrace trigger to the corresponding instance
tracing: Have traceon and traceoff trigger honor the instance
iio:imu:adis16480: fix buffering for devices with no burst mode
iio: adc: men_z188_adc: Fix a resource leak in an error handling path
iio: adc: tsc2046: fix memory corruption by preventing array overflow
iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
iio: accel: fxls8962af: add padding to regmap for SPI
iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
iio: Fix error handling for PM
sc16is7xx: Fix for incorrect data being transmitted
ata: pata_hpt37x: disable primary channel on HPT371
Revert "USB: serial: ch341: add new Product ID for CH341A"
usb: gadget: rndis: add spinlock for rndis response list
USB: gadget: validate endpoint index for xilinx udc
tracefs: Set the group ownership in apply_options() not parse_options()
USB: serial: option: add support for DW5829e
USB: serial: option: add Telit LE910R1 compositions
usb: dwc2: drd: fix soft connect when gadget is unconfigured
usb: dwc3: pci: Add "snps,dis_u2_susphy_quirk" for Intel Bay Trail
usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
xhci: re-initialize the HC during resume if HCE was set
xhci: Prevent futile URB re-submissions due to incorrect return value.
nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property
driver core: Free DMA range map when device is released
btrfs: prevent copying too big compressed lzo segment
RDMA/cma: Do not change route.addr.src_addr outside state checks
thermal: int340x: fix memory leak in int3400_notify()
staging: fbtft: fb_st7789v: reset display before initialization
tps6598x: clear int mask on probe failure
IB/qib: Fix duplicate sysfs directory name
riscv: fix nommu_k210_sdcard_defconfig
riscv: fix oops caused by irqsoff latency tracer
tty: n_gsm: fix encoding of control signal octet bit DV
tty: n_gsm: fix proper link termination after failed open
tty: n_gsm: fix NULL pointer access due to DLCI release
tty: n_gsm: fix wrong tty control line for flow control
tty: n_gsm: fix wrong modem processing in convergence layer type 2
tty: n_gsm: fix deadlock in gsmtty_open()
pinctrl: fix loop in k210_pinconf_get_drive()
pinctrl: k210: Fix bias-pull-up
gpio: tegra186: Fix chip_data type confusion
memblock: use kfree() to release kmalloced memblock regions
ice: Fix race conditions between virtchnl handling and VF ndo ops
ice: fix concurrent reset and removal of VFs
Linux 5.15.26
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ied0cc9bd48b7af71a064107676f37b0dd39ce3cf
commit 467a726b754f474936980da793b4ff2ec3e382a7 upstream.
The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.
The commit 24f600856418 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.
Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.20
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
selftests: mptcp: fix ipv6 routing setup
net: ipa: use a bitmap for endpoint replenish_enabled
net: ipa: prevent concurrent replenish
drm/vc4: hdmi: Make sure the device is powered with CEC
cgroup-v1: Require capabilities to set release_agent
Revert "mm/gup: small refactoring: simplify try_grab_page()"
ovl: don't fail copy up if no fileattr support on upper
lockd: fix server crash on reboot of client holding lock
lockd: fix failure to cleanup client locks
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
net/mlx5: Bridge, take rtnl lock in init error handler
net/mlx5: Bridge, ensure dev_name is null-terminated
net/mlx5e: Fix handling of wrong devices during bond netevent
net/mlx5: Use del_timer_sync in fw reset flow of halting poll
net/mlx5e: Fix module EEPROM query
net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
net/mlx5: E-Switch, Fix uninitialized variable modact
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
i40e: Fix reset bw limit when DCB enabled with 1 TC
i40e: Fix reset path while removing the driver
net: amd-xgbe: ensure to reset the tx_timer_active flag
net: amd-xgbe: Fix skb data length underflow
fanotify: Fix stale file descriptor in copy_event_to_user()
net: sched: fix use-after-free in tc_new_tfilter()
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
e1000e: Handshake with CSME starts from ADL platforms
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
ovl: fix NULL pointer dereference in copy up warning
Linux 5.15.20
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia50333eff81881fac62eb52455b502e6c46ff3d9
commit 24f6008564183aa120d07c03d9289519c2fe02af upstream.
The cgroup release_agent is called with call_usermodehelper. The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.
Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Fixes: 81a6a5cdd2 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.14
fscache_cookie_enabled: check cookie is valid before accessing it
selftests: x86: fix [-Wstringop-overread] warn in test_process_vm_readv()
tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
tracing: Tag trace_percpu_buffer as a percpu pointer
Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"
ieee802154: atusb: fix uninit value in atusb_set_extended_addr
i40e: Fix to not show opcode msg on unsuccessful VF MAC change
iavf: Fix limit of total number of queues to active queues of VF
RDMA/core: Don't infoleak GRH fields
Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"
netrom: fix copying in user data in nr_setsockopt
RDMA/uverbs: Check for null return of kmalloc_array
mac80211: initialize variable have_higher_than_11mbit
mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
sfc: The RX page_ring is optional
i40e: fix use-after-free in i40e_sync_filters_subtask()
i40e: Fix for displaying message regarding NVM version
i40e: Fix incorrect netdev's real number of RX/TX queues
ftrace/samples: Add missing prototypes direct functions
ipv4: Check attribute length for RTA_GATEWAY in multipath route
ipv4: Check attribute length for RTA_FLOW in multipath route
ipv6: Check attribute length for RTA_GATEWAY in multipath route
ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route
lwtunnel: Validate RTA_ENCAP_TYPE attribute length
selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature
sctp: hold endpoint before calling cb in sctp_transport_lookup_process
batman-adv: mcast: don't send link-local multicast to mcast routers
sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
net: ena: Fix undefined state when tx request id is out of bounds
net: ena: Fix wrong rx request id by resetting device
net: ena: Fix error handling when calculating max IO queues number
md/raid1: fix missing bitmap update w/o WriteMostly devices
EDAC/i10nm: Release mdev/mbase when failing to detect HBM
KVM: x86: Check for rmaps allocation
cgroup: Use open-time credentials for process migraton perm checks
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
cgroup: Use open-time cgroup namespace for process migration perm checks
Revert "i2c: core: support bus regulator controlling in adapter"
i2c: mpc: Avoid out of bounds memory access
xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
power: supply: core: Break capacity loop
power: reset: ltc2952: Fix use of floating point literals
reset: renesas: Fix Runtime PM usage
rndis_host: support Hytera digital radios
gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
net ticp:fix a kernel-infoleak in __tipc_sendmsg()
phonet: refcount leak in pep_sock_accep
fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb
drm/amdgpu: disable runpm if we are the primary adapter
power: bq25890: Enable continuous conversion for ADC at charging
ipv6: Continue processing multipath route even if gateway attribute is invalid
ipv6: Do cleanup if attribute validation fails in multipath route
auxdisplay: charlcd: checking for pointer reference before dereferencing
drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify
drm/amd/pm: Fix xgmi link control on aldebaran
usb: mtu3: fix interval value for intr and isoc
scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
net: udp: fix alignment problem in udp4_seq_show()
atlantic: Fix buff_ring OOB in aq_ring_rx_clean
drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
drm/amdgpu: always reset the asic in suspend (v2)
drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
mISDN: change function names to avoid conflicts
drm/amd/display: fix B0 TMDS deepcolor no dislay issue
drm/amd/display: Added power down for DCN10
ipv6: raw: check passed optlen before reading
userfaultfd/selftests: fix hugetlb area allocations
ARM: dts: gpio-ranges property is now required
Input: zinitix - make sure the IRQ is allocated before it gets enabled
Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"
drm/amd/pm: keep the BACO feature enabled for suspend
Linux 5.15.14
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifc22d4db0c3aa2164c4769981847e0634f2ad463
commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream.
of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.
Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.
v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
cgroup_file_ctx as suggested by Linus.
v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
Converted. Didn't change to embedded allocation as cgroup1 pidlists get
stored for caching.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream.
cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.
This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe84067 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.
It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.
[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org
Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
Pull cgroup updates from Tejun Heo:
"Two cpuset behavior changes:
- cpuset on cgroup2 is changed to enable memory migration based on
nodemask by default.
- A notification is generated when cpuset partition state changes.
All other patches are minor fixes and cleanups"
* 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Avoid compiler warnings with no subsystems
cgroup/cpuset: Avoid memory migration when nodemasks match
cgroup/cpuset: Enable memory migration for cpuset v2
cgroup/cpuset: Enable event notification when partition state changes
cgroup: cgroup-v1: clean up kernel-doc notation
cgroup: Replace deprecated CPU-hotplug functions.
cgroup/cpuset: Fix violation of cpuset locking rule
cgroup/cpuset: Fix a partition bug with hotplug
cgroup/cpuset: Miscellaneous code cleanup
cgroup: remove cgroup_mount from comments
This reverts commit 631c0bba0a.
Although this boots and passes CI build/boot testing, it leaves a
dirty trail consisting of 1000's of failures in the log and probably
wouldn't function all that well on a real H/W platform.
08-16 12:20:13.003 658 697 E libprocessgroup: AddTidToCgroup failed to write '3138'; fd=121: Permission denied
08-16 12:20:13.003 658 697 E libprocessgroup: Failed to add task into cgroup
Change-Id: Ia0f1948b0e94c27e5cecae8691348e044b32f7d6
Signed-off-by: Lee Jones <lee.jones@linaro.org>
This reverts commit a88f616760.
After some recent discussions, it transpires that this is no longer
required.
Bug: 31790445
Change-Id: I3ec80e21e192caa9e62715d450036e8565a21509
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Fix kernel-doc warnings found in cgroup-v1.c:
kernel/cgroup/cgroup-v1.c:55: warning: No description found for return value of 'cgroup_attach_task_all'
kernel/cgroup/cgroup-v1.c:94: warning: expecting prototype for cgroup_trasnsfer_tasks(). Prototype was for cgroup_transfer_tasks() instead
cgroup-v1.c:96: warning: No description found for return value of 'cgroup_transfer_tasks'
kernel/cgroup/cgroup-v1.c:687: warning: No description found for return value of 'cgroupstats_build'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests. Things like:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
RIP: 0010:kernfs_sop_show_path+0x1b/0x60
...or these others:
RIP: 0010:do_mkdirat+0x6a/0xf0
RIP: 0010:d_alloc_parallel+0x98/0x510
RIP: 0010:do_readlinkat+0x86/0x120
There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference. I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.
In commit 71d883c37e ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:
--------------
struct cgroup_subsys *ss;
- struct dentry *dentry;
[...]
- dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
- CGROUP_SUPER_MAGIC, ns);
[...]
- if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
- struct super_block *sb = dentry->d_sb;
- dput(dentry);
+ ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
+ if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+ struct super_block *sb = fc->root->d_sb;
+ dput(fc->root);
deactivate_locked_super(sb);
msleep(10);
return restart_syscall();
}
--------------
In changing from the local "*dentry" variable to using fc->root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case. With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.
A fix would be to not leave the stale reference in fc->root as follows:
--------------
dput(fc->root);
+ fc->root = NULL;
deactivate_locked_super(sb);
--------------
...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org # v5.1+
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Fixes: 71d883c37e ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The following sequence can be used to trigger a UAF:
int fscontext_fd = fsopen("cgroup");
int fd_null = open("/dev/null, O_RDONLY);
int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
close_range(3, ~0U, 0);
The cgroup v1 specific fs parser expects a string for the "source"
parameter. However, it is perfectly legitimate to e.g. specify a file
descriptor for the "source" parameter. The fs parser doesn't know what
a filesystem allows there. So it's a bug to assume that "source" is
always of type fs_value_is_string when it can reasonably also be
fs_value_is_file.
This assumption in the cgroup code causes a UAF because struct
fs_parameter uses a union for the actual value. Access to that union is
guarded by the param->type member. Since the cgroup paramter parser
didn't check param->type but unconditionally moved param->string into
fc->source a close on the fscontext_fd would trigger a UAF during
put_fs_context() which frees fc->source thereby freeing the file stashed
in param->file causing a UAF during a close of the fd_null.
Fix this by verifying that param->type is actually a string and report
an error if not.
In follow up patches I'll add a new generic helper that can be used here
and by other filesystems instead of this error-prone copy-pasta fix.
But fixing it in here first makes backporting a it to stable a lot
easier.
Fixes: 8d2451f499 ("cgroup1: switch to option-by-option parsing")
Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@kernel.org>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix some spelling mistakes in comments:
hierarhcy ==> hierarchy
automtically ==> automatically
overriden ==> overridden
In absense of .. or ==> In absence of .. and
assocaited ==> associated
taget ==> target
initate ==> initiate
succeded ==> succeeded
curremt ==> current
udpated ==> updated
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If delayacct is disabled, then delayacct_is_task_waiting_on_io()
always returns false, which causes the statistical value to be
wrong. Perhaps tsk->in_iowait is better.
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add a vendor hook after attaching a task to a cgroup to
recognize the group_id for performance tuning
Bug: 181917687
Signed-off-by: Frankie Chang <frankie.chang@mediatek.com>
Change-Id: I603afa3d893dd575a7dcb97f83bd9eacb8315bab
(cherry picked from commit de089a37a3d248608a1d5855a4ae82ebad3ec2ab)
When mounting a cgroup hierarchy with disabled controller in cgroup v1,
all available controllers will be attached.
For example, boot with cgroup_no_v1=cpu or cgroup_disable=cpu, and then
mount with "mount -t cgroup -ocpu cpu /sys/fs/cgroup/cpu", then all
enabled controllers will be attached except cpu.
Fix this by adding disabled controller check in cgroup1_parse_param().
If the specified controller is disabled, just return error with information
"Disabled controller xx" rather than attaching all the other enabled
controllers.
Fixes: f5dfb5315d ("cgroup: take options parsing into ->parse_monolithic()")
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Reviewed-by: Zefan Li <lizefan.x@bytedance.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
cgrp->root->release_agent_path is protected by both cgroup_mutex and
release_agent_path_lock and readers can hold either one. The
dual-locking scheme was introduced while breaking a locking dependency
issue around cgroup_mutex but doesn't make sense anymore given that
the only remaining reader which uses cgroup_mutex is
cgroup1_releaes_agent().
This patch updates cgroup1_release_agent() to use
release_agent_path_lock so that release_agent_path is always protected
only by release_agent_path_lock.
While at it, convert strlen() based empty string checks to direct
tests on the first character as suggested by Linus.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Older (and maybe current) versions of systemd set release_agent to "" when
shutting down, but do not set notify_on_release to 0.
Since 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()"), we filter out such calls when the user mode helper
path is "". However, when used in conjunction with an actual (i.e. non "")
STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
will be called with argv[0] == "".
Let's avoid this by not invoking the release_agent when it is "".
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Tejun Heo <tj@kernel.org>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
# mount | grep cgroup
# dd if=/mnt/cgroup.procs bs=1 # normal output
...
1294
1295
1296
1304
1382
584+0 records in
584+0 records out
584 bytes copied
dd: /mnt/cgroup.procs: cannot skip to specified offset
83 <<< generates end of last line
1383 <<< ... and whole last line once again
0+1 records in
0+1 records out
8 bytes copied
dd: /mnt/cgroup.procs: cannot skip to specified offset
1386 <<< generates last line anyway
0+1 records in
0+1 records out
5 bytes copied
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Tiny steps to deal with merge issues in sdcardfs due to fs param passing
api changes.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I03ba8763e8cc324c25fb6316c363b59957103474
Android expects system_server to be able to move tasks between different
cgroups/cpusets, but does not want to be running as root. Let's relax
permission check so that processes can move other tasks if they have
CAP_SYS_NICE in the affected task's user namespace.
BUG=b:31790445,chromium:647994
Bug: 147109865
TEST=Boot android container, examine logcat
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/394927
Reviewed-by: Ricky Zhou <rickyz@chromium.org>
[AmitP: Refactored original changes to align with upstream commit
201af4c0fa ("cgroup: move cgroup files under kernel/cgroup/")]
Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit ec54762b84a1d06de188bc846655305d3f7acf75)
There are reports of users who use thread migrations between cgroups and
they report performance drop after d59cfc09c3 ("sched, cgroup: replace
signal_struct->group_rwsem with a global percpu_rwsem"). The effect is
pronounced on machines with more CPUs.
The migration is affected by forking noise happening in the background,
after the mentioned commit a migrating thread must wait for all
(forking) processes on the system, not only of its threadgroup.
There are several places that need to synchronize with migration:
a) do_exit,
b) de_thread,
c) copy_process,
d) cgroup_update_dfl_csses,
e) parallel migration (cgroup_{proc,thread}s_write).
In the case of self-migrating thread, we relax the synchronization on
cgroup_threadgroup_rwsem to avoid the cost of waiting. d) and e) are
excluded with cgroup_mutex, c) does not matter in case of single thread
migration and the executing thread cannot exec(2) or exit(2) while it is
writing into cgroup.threads. In case of do_exit because of signal
delivery, we either exit before the migration or finish the migration
(of not yet PF_EXITING thread) and die afterwards.
This patch handles only the case of self-migration by writing "0" into
cgroup.threads. For simplicity, we always take cgroup_threadgroup_rwsem
with numeric PIDs.
This change improves migration dependent workload performance similar
to per-signal_struct state.
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Instead of using its own logic for k-/vmalloc rely on
kvmalloc which is actually doing quite the same.
Signed-off-by: Marc Koderer <marc@koderer.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The helper is identical to the existing cgroup_task_count()
except it doesn't take the css_set_lock by itself, assuming
that the caller does.
Also, move cgroup_task_count() implementation into
kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
Add some logging to the core users of the fs_context log so that
information can be extracted from them as to the reason for failure.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>