Commit Graph

85 Commits

Author SHA1 Message Date
Tetsuo Handa
95af22de88 UPSTREAM: cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Bug:254143784

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1]
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Change-Id: I21d7ca425c91efe5773ad5e9aec2bbf1c09737f5
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 43626dade36fa74d3329046f4ae2d7fdefe401c6
git: //git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
2022-10-18 11:09:08 +08:00
Greg Kroah-Hartman
26481b5161 Merge 5.15.26 into android13-5.15
Changes in 5.15.26
	mm/filemap: Fix handling of THPs in generic_file_buffered_read()
	cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
	cgroup-v1: Correct privileges check in release_agent writes
	x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
	btrfs: tree-checker: check item_size for inode_item
	btrfs: tree-checker: check item_size for dev_item
	clk: jz4725b: fix mmc0 clock gating
	io_uring: don't convert to jiffies for waiting on timeouts
	io_uring: disallow modification of rsrc_data during quiesce
	selinux: fix misuse of mutex_is_locked()
	vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
	parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
	parisc/unaligned: Fix ldw() and stw() unalignment handlers
	KVM: x86/mmu: make apf token non-zero to fix bug
	drm/amd/display: Protect update_bw_bounding_box FPU code.
	drm/amd/pm: fix some OEM SKU specific stability issues
	drm/amd: Check if ASPM is enabled from PCIe subsystem
	drm/amdgpu: disable MMHUB PG for Picasso
	drm/amdgpu: do not enable asic reset for raven2
	drm/i915: Widen the QGV point mask
	drm/i915: Correctly populate use_sagv_wm for all pipes
	drm/i915: Fix bw atomic check when switching between SAGV vs. no SAGV
	sr9700: sanity check for packet length
	USB: zaurus: support another broken Zaurus
	CDC-NCM: avoid overflow in sanity checking
	netfilter: xt_socket: fix a typo in socket_mt_destroy()
	netfilter: xt_socket: missing ifdef CONFIG_IP6_NF_IPTABLES dependency
	netfilter: nf_tables_offload: incorrect flow offload action array size
	tee: export teedev_open() and teedev_close_context()
	optee: use driver internal tee_context for some rpc
	ping: remove pr_err from ping_lookup
	Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
	gpu: host1x: Always return syncpoint value when waiting
	perf evlist: Fix failed to use cpu list for uncore events
	perf data: Fix double free in perf_session__delete()
	mptcp: fix race in incoming ADD_ADDR option processing
	mptcp: add mibs counter for ignored incoming options
	selftests: mptcp: fix diag instability
	selftests: mptcp: be more conservative with cookie MPJ limits
	bnx2x: fix driver load from initrd
	bnxt_en: Fix active FEC reporting to ethtool
	bnxt_en: Fix offline ethtool selftest with RDMA enabled
	bnxt_en: Fix incorrect multicast rx mask setting when not requested
	hwmon: Handle failure to register sensor with thermal zone correctly
	net/mlx5: Fix tc max supported prio for nic mode
	ice: check the return of ice_ptp_gettimex64
	ice: initialize local variable 'tlv'
	net/mlx5: Update the list of the PCI supported devices
	bpf: Fix crash due to incorrect copy_map_value
	bpf: Do not try bpf_msg_push_data with len 0
	selftests: bpf: Check bpf_msg_push_data return value
	bpf: Fix a bpf_timer initialization issue
	bpf: Add schedule points in batch ops
	io_uring: add a schedule point in io_add_buffers()
	net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
	nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
	tipc: Fix end of loop tests for list_for_each_entry()
	gso: do not skip outer ip header in case of ipip and net_failover
	net: mv643xx_eth: process retval from of_get_mac_address
	openvswitch: Fix setting ipv6 fields causing hw csum failure
	drm/edid: Always set RGB444
	net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
	drm/vc4: crtc: Fix runtime_pm reference counting
	drm/i915/dg2: Print PHY name properly on calibration error
	net/sched: act_ct: Fix flow table lookup after ct clear or switching zones
	net: ll_temac: check the return value of devm_kmalloc()
	net: Force inlining of checksum functions in net/checksum.h
	netfilter: nf_tables: unregister flowtable hooks on netns exit
	nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac()
	net: mdio-ipq4019: add delay after clock enable
	netfilter: nf_tables: fix memory leak during stateful obj update
	net/smc: Use a mutex for locking "struct smc_pnettable"
	surface: surface3_power: Fix battery readings on batteries without a serial number
	udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
	net/mlx5: DR, Cache STE shadow memory
	ibmvnic: schedule failover only if vioctl fails
	net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
	net/mlx5: Fix possible deadlock on rule deletion
	net/mlx5: Fix wrong limitation of metadata match on ecpf
	net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
	net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
	net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
	net/mlx5: Update log_max_qp value to be 17 at most
	spi: spi-zynq-qspi: Fix a NULL pointer dereference in zynq_qspi_exec_mem_op()
	gpio: rockchip: Reset int_bothedge when changing trigger
	regmap-irq: Update interrupt clear register for proper reset
	net-timestamp: convert sk->sk_tskey to atomic_t
	RDMA/rtrs-clt: Fix possible double free in error case
	RDMA/rtrs-clt: Move free_permit from free_clt to rtrs_clt_close
	bnxt_en: Increase firmware message response DMA wait time
	configfs: fix a race in configfs_{,un}register_subsystem()
	RDMA/ib_srp: Fix a deadlock
	tracing: Dump stacktrace trigger to the corresponding instance
	tracing: Have traceon and traceoff trigger honor the instance
	iio:imu:adis16480: fix buffering for devices with no burst mode
	iio: adc: men_z188_adc: Fix a resource leak in an error handling path
	iio: adc: tsc2046: fix memory corruption by preventing array overflow
	iio: adc: ad7124: fix mask used for setting AIN_BUFP & AIN_BUFM bits
	iio: accel: fxls8962af: add padding to regmap for SPI
	iio: imu: st_lsm6dsx: wait for settling time in st_lsm6dsx_read_oneshot
	iio: Fix error handling for PM
	sc16is7xx: Fix for incorrect data being transmitted
	ata: pata_hpt37x: disable primary channel on HPT371
	Revert "USB: serial: ch341: add new Product ID for CH341A"
	usb: gadget: rndis: add spinlock for rndis response list
	USB: gadget: validate endpoint index for xilinx udc
	tracefs: Set the group ownership in apply_options() not parse_options()
	USB: serial: option: add support for DW5829e
	USB: serial: option: add Telit LE910R1 compositions
	usb: dwc2: drd: fix soft connect when gadget is unconfigured
	usb: dwc3: pci: Add "snps,dis_u2_susphy_quirk" for Intel Bay Trail
	usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
	usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
	xhci: re-initialize the HC during resume if HCE was set
	xhci: Prevent futile URB re-submissions due to incorrect return value.
	nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property
	mtd: core: Fix a conflict between MTD and NVMEM on wp-gpios property
	driver core: Free DMA range map when device is released
	btrfs: prevent copying too big compressed lzo segment
	RDMA/cma: Do not change route.addr.src_addr outside state checks
	thermal: int340x: fix memory leak in int3400_notify()
	staging: fbtft: fb_st7789v: reset display before initialization
	tps6598x: clear int mask on probe failure
	IB/qib: Fix duplicate sysfs directory name
	riscv: fix nommu_k210_sdcard_defconfig
	riscv: fix oops caused by irqsoff latency tracer
	tty: n_gsm: fix encoding of control signal octet bit DV
	tty: n_gsm: fix proper link termination after failed open
	tty: n_gsm: fix NULL pointer access due to DLCI release
	tty: n_gsm: fix wrong tty control line for flow control
	tty: n_gsm: fix wrong modem processing in convergence layer type 2
	tty: n_gsm: fix deadlock in gsmtty_open()
	pinctrl: fix loop in k210_pinconf_get_drive()
	pinctrl: k210: Fix bias-pull-up
	gpio: tegra186: Fix chip_data type confusion
	memblock: use kfree() to release kmalloced memblock regions
	ice: Fix race conditions between virtchnl handling and VF ndo ops
	ice: fix concurrent reset and removal of VFs
	Linux 5.15.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ied0cc9bd48b7af71a064107676f37b0dd39ce3cf
2022-03-16 12:53:52 +01:00
Michal Koutný
ebeb7b7357 cgroup-v1: Correct privileges check in release_agent writes
commit 467a726b754f474936980da793b4ff2ec3e382a7 upstream.

The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.

The commit 24f600856418 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.

Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02 11:47:47 +01:00
Greg Kroah-Hartman
344a3ff87c Merge 5.15.20 into android13-5.15
Changes in 5.15.20
	PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
	selftests: mptcp: fix ipv6 routing setup
	net: ipa: use a bitmap for endpoint replenish_enabled
	net: ipa: prevent concurrent replenish
	drm/vc4: hdmi: Make sure the device is powered with CEC
	cgroup-v1: Require capabilities to set release_agent
	Revert "mm/gup: small refactoring: simplify try_grab_page()"
	ovl: don't fail copy up if no fileattr support on upper
	lockd: fix server crash on reboot of client holding lock
	lockd: fix failure to cleanup client locks
	net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
	net/mlx5: Bridge, take rtnl lock in init error handler
	net/mlx5: Bridge, ensure dev_name is null-terminated
	net/mlx5e: Fix handling of wrong devices during bond netevent
	net/mlx5: Use del_timer_sync in fw reset flow of halting poll
	net/mlx5e: Fix module EEPROM query
	net/mlx5: Fix offloading with ESWITCH_IPV4_TTL_MODIFY_ENABLE
	net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
	net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion
	net/mlx5: E-Switch, Fix uninitialized variable modact
	ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
	i40e: Fix reset bw limit when DCB enabled with 1 TC
	i40e: Fix reset path while removing the driver
	net: amd-xgbe: ensure to reset the tx_timer_active flag
	net: amd-xgbe: Fix skb data length underflow
	fanotify: Fix stale file descriptor in copy_event_to_user()
	net: sched: fix use-after-free in tc_new_tfilter()
	rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
	cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
	e1000e: Handshake with CSME starts from ADL platforms
	af_packet: fix data-race in packet_setsockopt / packet_setsockopt
	tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
	ovl: fix NULL pointer dereference in copy up warning
	Linux 5.15.20

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia50333eff81881fac62eb52455b502e6c46ff3d9
2022-02-05 13:22:13 +01:00
Eric W. Biederman
4b1c32bfaa cgroup-v1: Require capabilities to set release_agent
commit 24f6008564183aa120d07c03d9289519c2fe02af upstream.

The cgroup release_agent is called with call_usermodehelper.  The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.

Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Fixes: 81a6a5cdd2 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-05 12:38:57 +01:00
Greg Kroah-Hartman
173de0c81d Merge 5.15.14 into android13-5.15
Changes in 5.15.14
	fscache_cookie_enabled: check cookie is valid before accessing it
	selftests: x86: fix [-Wstringop-overread] warn in test_process_vm_readv()
	tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()
	tracing: Tag trace_percpu_buffer as a percpu pointer
	Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow"
	ieee802154: atusb: fix uninit value in atusb_set_extended_addr
	i40e: Fix to not show opcode msg on unsuccessful VF MAC change
	iavf: Fix limit of total number of queues to active queues of VF
	RDMA/core: Don't infoleak GRH fields
	Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks"
	netrom: fix copying in user data in nr_setsockopt
	RDMA/uverbs: Check for null return of kmalloc_array
	mac80211: initialize variable have_higher_than_11mbit
	mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh
	sfc: The RX page_ring is optional
	i40e: fix use-after-free in i40e_sync_filters_subtask()
	i40e: Fix for displaying message regarding NVM version
	i40e: Fix incorrect netdev's real number of RX/TX queues
	ftrace/samples: Add missing prototypes direct functions
	ipv4: Check attribute length for RTA_GATEWAY in multipath route
	ipv4: Check attribute length for RTA_FLOW in multipath route
	ipv6: Check attribute length for RTA_GATEWAY in multipath route
	ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route
	lwtunnel: Validate RTA_ENCAP_TYPE attribute length
	selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature
	sctp: hold endpoint before calling cb in sctp_transport_lookup_process
	batman-adv: mcast: don't send link-local multicast to mcast routers
	sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc
	net: ena: Fix undefined state when tx request id is out of bounds
	net: ena: Fix wrong rx request id by resetting device
	net: ena: Fix error handling when calculating max IO queues number
	md/raid1: fix missing bitmap update w/o WriteMostly devices
	EDAC/i10nm: Release mdev/mbase when failing to detect HBM
	KVM: x86: Check for rmaps allocation
	cgroup: Use open-time credentials for process migraton perm checks
	cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
	cgroup: Use open-time cgroup namespace for process migration perm checks
	Revert "i2c: core: support bus regulator controlling in adapter"
	i2c: mpc: Avoid out of bounds memory access
	xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
	power: supply: core: Break capacity loop
	power: reset: ltc2952: Fix use of floating point literals
	reset: renesas: Fix Runtime PM usage
	rndis_host: support Hytera digital radios
	gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler
	net ticp:fix a kernel-infoleak in __tipc_sendmsg()
	phonet: refcount leak in pep_sock_accep
	fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb
	drm/amdgpu: disable runpm if we are the primary adapter
	power: bq25890: Enable continuous conversion for ADC at charging
	ipv6: Continue processing multipath route even if gateway attribute is invalid
	ipv6: Do cleanup if attribute validation fails in multipath route
	auxdisplay: charlcd: checking for pointer reference before dereferencing
	drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify
	drm/amd/pm: Fix xgmi link control on aldebaran
	usb: mtu3: fix interval value for intr and isoc
	scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()
	ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate
	net: udp: fix alignment problem in udp4_seq_show()
	atlantic: Fix buff_ring OOB in aq_ring_rx_clean
	drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume
	drm/amdgpu: always reset the asic in suspend (v2)
	drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform
	mISDN: change function names to avoid conflicts
	drm/amd/display: fix B0 TMDS deepcolor no dislay issue
	drm/amd/display: Added power down for DCN10
	ipv6: raw: check passed optlen before reading
	userfaultfd/selftests: fix hugetlb area allocations
	ARM: dts: gpio-ranges property is now required
	Input: zinitix - make sure the IRQ is allocated before it gets enabled
	Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"
	drm/amd/pm: keep the BACO feature enabled for suspend
	Linux 5.15.14

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifc22d4db0c3aa2164c4769981847e0634f2ad463
2022-01-12 09:00:42 +01:00
Tejun Heo
50273128d6 cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream.

of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.

Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.

v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
    cgroup_file_ctx as suggested by Linus.

v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
    Converted. Didn't change to embedded allocation as cgroup1 pidlists get
    stored for caching.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11 15:35:15 +01:00
Tejun Heo
c6ebc35298 cgroup: Use open-time credentials for process migraton perm checks
commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream.

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.

Reported-by: "Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe84067 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11 15:35:15 +01:00
Pavankumar Kondeti
d4f032e36b ANDROID: cgroup: Add android_rvh_cgroup_force_kthread_migration
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.

It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.

[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org

Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
2021-10-12 16:44:41 -07:00
Greg Kroah-Hartman
bc2f6edebd Merge 9e9fb7655e ("Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") into android-mainline
Steps on the way to 5.15-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I49577d606b2710975407eae3fee60bc331397810
2021-09-07 14:40:30 +02:00
Linus Torvalds
69dc8010b8 Merge branch 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Two cpuset behavior changes:

   - cpuset on cgroup2 is changed to enable memory migration based on
     nodemask by default.

   - A notification is generated when cpuset partition state changes.

  All other patches are minor fixes and cleanups"

* 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Avoid compiler warnings with no subsystems
  cgroup/cpuset: Avoid memory migration when nodemasks match
  cgroup/cpuset: Enable memory migration for cpuset v2
  cgroup/cpuset: Enable event notification when partition state changes
  cgroup: cgroup-v1: clean up kernel-doc notation
  cgroup: Replace deprecated CPU-hotplug functions.
  cgroup/cpuset: Fix violation of cpuset locking rule
  cgroup/cpuset: Fix a partition bug with hotplug
  cgroup/cpuset: Miscellaneous code cleanup
  cgroup: remove cgroup_mount from comments
2021-08-31 15:49:04 -07:00
Lee Jones
eba773ab53 Revert "Revert "CHROMIUM: cgroups: relax permissions on moving tasks between cgroups""
This reverts commit 631c0bba0a.

Although this boots and passes CI build/boot testing, it leaves a
dirty trail consisting of 1000's of failures in the log and probably
wouldn't function all that well on a real H/W platform.

  08-16 12:20:13.003   658   697 E libprocessgroup: AddTidToCgroup failed to write '3138'; fd=121: Permission denied
  08-16 12:20:13.003   658   697 E libprocessgroup: Failed to add task into cgroup

Change-Id: Ia0f1948b0e94c27e5cecae8691348e044b32f7d6
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-16 14:50:30 +01:00
Lee Jones
631c0bba0a Revert "CHROMIUM: cgroups: relax permissions on moving tasks between cgroups"
This reverts commit a88f616760.

After some recent discussions, it transpires that this is no longer
required.

Bug: 31790445
Change-Id: I3ec80e21e192caa9e62715d450036e8565a21509
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-16 12:53:58 +01:00
Randy Dunlap
b4cc619608 cgroup: cgroup-v1: clean up kernel-doc notation
Fix kernel-doc warnings found in cgroup-v1.c:

kernel/cgroup/cgroup-v1.c:55: warning: No description found for return value of 'cgroup_attach_task_all'
kernel/cgroup/cgroup-v1.c:94: warning: expecting prototype for cgroup_trasnsfer_tasks(). Prototype was for cgroup_transfer_tasks() instead
cgroup-v1.c:96: warning: No description found for return value of 'cgroup_transfer_tasks'
kernel/cgroup/cgroup-v1.c:687: warning: No description found for return value of 'cgroupstats_build'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-08-11 07:57:43 -10:00
Lee Jones
8698c3da64 Merge tag 'v5.14-rc4' into android-mainline
Linux 5.14-rc4

Change-Id: I5c52cb9dda8eda42aa15b4ed6488367fbcc0c11a
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-02 13:18:42 +01:00
Lee Jones
946e465c81 Merge tag 'v5.14-rc2' into android-mainline
Linux 5.14-rc2

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ia2131de59daa96610741f5a0ff267b0d08697023
2021-07-22 14:14:38 +01:00
Paul Gortmaker
1e7107c5ef cgroup1: fix leaked context root causing sporadic NULL deref in LTP
Richard reported sporadic (roughly one in 10 or so) null dereferences and
other strange behaviour for a set of automated LTP tests.  Things like:

   BUG: kernel NULL pointer dereference, address: 0000000000000008
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x0000) - not-present page
   PGD 0 P4D 0
   Oops: 0000 [#1] PREEMPT SMP PTI
   CPU: 0 PID: 1516 Comm: umount Not tainted 5.10.0-yocto-standard #1
   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
   RIP: 0010:kernfs_sop_show_path+0x1b/0x60

...or these others:

   RIP: 0010:do_mkdirat+0x6a/0xf0
   RIP: 0010:d_alloc_parallel+0x98/0x510
   RIP: 0010:do_readlinkat+0x86/0x120

There were other less common instances of some kind of a general scribble
but the common theme was mount and cgroup and a dubious dentry triggering
the NULL dereference.  I was only able to reproduce it under qemu by
replicating Richard's setup as closely as possible - I never did get it
to happen on bare metal, even while keeping everything else the same.

In commit 71d883c37e ("cgroup_do_mount(): massage calling conventions")
we see this as a part of the overall change:

   --------------
           struct cgroup_subsys *ss;
   -       struct dentry *dentry;

   [...]

   -       dentry = cgroup_do_mount(&cgroup_fs_type, fc->sb_flags, root,
   -                                CGROUP_SUPER_MAGIC, ns);

   [...]

   -       if (percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
   -               struct super_block *sb = dentry->d_sb;
   -               dput(dentry);
   +       ret = cgroup_do_mount(fc, CGROUP_SUPER_MAGIC, ns);
   +       if (!ret && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
   +               struct super_block *sb = fc->root->d_sb;
   +               dput(fc->root);
                   deactivate_locked_super(sb);
                   msleep(10);
                   return restart_syscall();
           }
   --------------

In changing from the local "*dentry" variable to using fc->root, we now
export/leave that dentry pointer in the file context after doing the dput()
in the unlikely "is_dying" case.   With LTP doing a crazy amount of back to
back mount/unmount [testcases/bin/cgroup_regression_5_1.sh] the unlikely
becomes slightly likely and then bad things happen.

A fix would be to not leave the stale reference in fc->root as follows:

   --------------
                  dput(fc->root);
  +               fc->root = NULL;
                  deactivate_locked_super(sb);
   --------------

...but then we are just open-coding a duplicate of fc_drop_locked() so we
simply use that instead.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: stable@vger.kernel.org      # v5.1+
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Fixes: 71d883c37e ("cgroup_do_mount(): massage calling conventions")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-07-21 06:39:20 -10:00
Christian Brauner
d1d488d813 fs: add vfs_parse_fs_param_source() helper
Add a simple helper that filesystems can use in their parameter parser
to parse the "source" parameter. A few places open-coded this function
and that already caused a bug in the cgroup v1 parser that we fixed.
Let's make it harder to get this wrong by introducing a helper which
performs all necessary checks.

Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-14 09:19:06 -07:00
Christian Brauner
3b0462726e cgroup: verify that source is a string
The following sequence can be used to trigger a UAF:

    int fscontext_fd = fsopen("cgroup");
    int fd_null = open("/dev/null, O_RDONLY);
    int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
    close_range(3, ~0U, 0);

The cgroup v1 specific fs parser expects a string for the "source"
parameter.  However, it is perfectly legitimate to e.g.  specify a file
descriptor for the "source" parameter.  The fs parser doesn't know what
a filesystem allows there.  So it's a bug to assume that "source" is
always of type fs_value_is_string when it can reasonably also be
fs_value_is_file.

This assumption in the cgroup code causes a UAF because struct
fs_parameter uses a union for the actual value.  Access to that union is
guarded by the param->type member.  Since the cgroup paramter parser
didn't check param->type but unconditionally moved param->string into
fc->source a close on the fscontext_fd would trigger a UAF during
put_fs_context() which frees fc->source thereby freeing the file stashed
in param->file causing a UAF during a close of the fd_null.

Fix this by verifying that param->type is actually a string and report
an error if not.

In follow up patches I'll add a new generic helper that can be used here
and by other filesystems instead of this error-prone copy-pasta fix.
But fixing it in here first makes backporting a it to stable a lot
easier.

Fixes: 8d2451f499 ("cgroup1: switch to option-by-option parsing")
Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@kernel.org>
Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-14 09:19:06 -07:00
Lee Jones
7889eed917 Merge 54a728dc5e ("Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
A little step towards 5.14-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: I2573a6df9f4e7b67194327ac6db6082a574d2809
2021-07-09 10:55:21 +01:00
Peter Zijlstra
2f064a59a1 sched: Change task_struct::state
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
2021-06-18 11:43:09 +02:00
Greg Kroah-Hartman
1a6552d0ed Merge 8ecfa36cd4 ("Merge tag 'riscv-for-linus-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux") into android-mainline
Steps on the way to 5.13-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaed63766ecfc7e2126e108e93808e79469a9facf
2021-06-13 15:19:53 +02:00
Alexander Kuznetsov
b7e24eb1ca cgroup1: don't allow '\n' in renaming
cgroup_mkdir() have restriction on newline usage in names:
$ mkdir $'/sys/fs/cgroup/cpu/test\ntest2'
mkdir: cannot create directory
'/sys/fs/cgroup/cpu/test\ntest2': Invalid argument

But in cgroup1_rename() such check is missed.
This allows us to make /proc/<pid>/cgroup unparsable:
$ mkdir /sys/fs/cgroup/cpu/test
$ mv /sys/fs/cgroup/cpu/test $'/sys/fs/cgroup/cpu/test\ntest2'
$ echo $$ > $'/sys/fs/cgroup/cpu/test\ntest2'
$ cat /proc/self/cgroup
11:pids:/
10:freezer:/
9:hugetlb:/
8:cpuset:/
7:blkio:/user.slice
6:memory:/user.slice
5:net_cls,net_prio:/
4:perf_event:/
3:devices:/user.slice
2:cpu,cpuacct:/test
test2
1:name=systemd:/
0::/

Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru>
Reported-by: Andrey Krasichkov <buglloc@yandex-team.ru>
Acked-by: Dmitry Yakunin <zeil@yandex-team.ru>
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-06-10 09:58:50 -04:00
Greg Kroah-Hartman
b1065ab819 Merge tag 'v5.13-rc4' into android-mainline
Linux 5.13-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I05336d3226a208ac657fcfd4d39b418ad1dba1bd
2021-06-01 09:10:12 +02:00
Zhen Lei
08b2b6fdf6 cgroup: fix spelling mistakes
Fix some spelling mistakes in comments:
hierarhcy ==> hierarchy
automtically ==> automatically
overriden ==> overridden
In absense of .. or ==> In absence of .. and
assocaited ==> associated
taget ==> target
initate ==> initiate
succeded ==> succeeded
curremt ==> current
udpated ==> updated

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-05-24 12:45:26 -04:00
Lee Jones
4797acfb9c Merge 16b3d0cf5b Merge tag 'sched-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into android-mainline
A little step en route to v5.13-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Ic2fb8aa220023572c96907aebce0a675333ef29f
2021-05-10 10:28:52 +01:00
Chunguang Xu
ffeee417d9 cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io()
If delayacct is disabled, then delayacct_is_task_waiting_on_io()
always returns false, which causes the statistical value to be
wrong. Perhaps tsk->in_iowait is better.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-04-16 16:49:37 -04:00
Frankie Chang
7d91e4ee75 ANDROID: cgroup: Add vendor hook to the cgroup
Add a vendor hook after attaching a task to a cgroup to 
recognize the group_id for performance tuning

Bug: 181917687

Signed-off-by: Frankie Chang <frankie.chang@mediatek.com>
Change-Id: I603afa3d893dd575a7dcb97f83bd9eacb8315bab
(cherry picked from commit de089a37a3d248608a1d5855a4ae82ebad3ec2ab)
2021-03-09 01:59:47 +00:00
Greg Kroah-Hartman
542ddf1f44 Merge 358feceebb ("Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux") into android-mainline
Final steps on the way to 5.11-final

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I81cdc8804b9c18e722385cac332c042bc5e68113
2021-02-14 13:47:24 +01:00
Chen Zhou
61e960b07b cgroup-v1: add disabled controller check in cgroup1_parse_param()
When mounting a cgroup hierarchy with disabled controller in cgroup v1,
all available controllers will be attached.
For example, boot with cgroup_no_v1=cpu or cgroup_disable=cpu, and then
mount with "mount -t cgroup -ocpu cpu /sys/fs/cgroup/cpu", then all
enabled controllers will be attached except cpu.

Fix this by adding disabled controller check in cgroup1_parse_param().
If the specified controller is disabled, just return error with information
"Disabled controller xx" rather than attaching all the other enabled
controllers.

Fixes: f5dfb5315d ("cgroup: take options parsing into ->parse_monolithic()")
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Reviewed-by: Zefan Li <lizefan.x@bytedance.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2021-01-15 15:10:37 -05:00
Greg Kroah-Hartman
279177734b Merge v5.11-rc2 into android-mainline
Linux 5.11-rc2

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I779e6488c68272f59e0bd53e432530b4e55f51b9
2021-01-13 14:53:07 +01:00
Qinglang Miao
2d18e54dd8 cgroup: Fix memory leak when parsing multiple source parameters
A memory leak is found in cgroup1_parse_param() when multiple source
parameters overwrite fc->source in the fs_context struct without free.

unreferenced object 0xffff888100d930e0 (size 16):
  comm "mount", pid 520, jiffies 4303326831 (age 152.783s)
  hex dump (first 16 bytes):
    74 65 73 74 6c 65 61 6b 00 00 00 00 00 00 00 00  testleak........
  backtrace:
    [<000000003e5023ec>] kmemdup_nul+0x2d/0xa0
    [<00000000377dbdaa>] vfs_parse_fs_string+0xc0/0x150
    [<00000000cb2b4882>] generic_parse_monolithic+0x15a/0x1d0
    [<000000000f750198>] path_mount+0xee1/0x1820
    [<0000000004756de2>] do_mount+0xea/0x100
    [<0000000094cafb0a>] __x64_sys_mount+0x14b/0x1f0

Fix this bug by permitting a single source parameter and rejecting with
an error all subsequent ones.

Fixes: 8d2451f499 ("cgroup1: switch to option-by-option parsing")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-12-16 10:10:32 -05:00
Greg Kroah-Hartman
34ed0e2946 Merge 5364abc579 ("Merge tag 'arc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc") into android-mainline
Steps along the 5.7-rc1 merge.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib9f87147ac3d81985496818b0c61bdd086140eed
2020-04-08 09:25:42 +02:00
Greg Kroah-Hartman
ae56fd997e Merge 5.6-rc6 into android-mainline
Linux 5.6-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I6c2d7aff44ad5a9b75030b72d34ca5dbd5ad3ceb
2020-03-16 08:09:43 +01:00
Tejun Heo
e7b20d9796 cgroup: Restructure release_agent_path handling
cgrp->root->release_agent_path is protected by both cgroup_mutex and
release_agent_path_lock and readers can hold either one. The
dual-locking scheme was introduced while breaking a locking dependency
issue around cgroup_mutex but doesn't make sense anymore given that
the only remaining reader which uses cgroup_mutex is
cgroup1_releaes_agent().

This patch updates cgroup1_release_agent() to use
release_agent_path_lock so that release_agent_path is always protected
only by release_agent_path_lock.

While at it, convert strlen() based empty string checks to direct
tests on the first character as suggested by Linus.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-12 16:44:35 -04:00
Tycho Andersen
2e5383d790 cgroup1: don't call release_agent when it is ""
Older (and maybe current) versions of systemd set release_agent to "" when
shutting down, but do not set notify_on_release to 0.

Since 64e90a8acb ("Introduce STATIC_USERMODEHELPER to mediate
call_usermodehelper()"), we filter out such calls when the user mode helper
path is "". However, when used in conjunction with an actual (i.e. non "")
STATIC_USERMODEHELPER, the path is never "", so the real usermode helper
will be called with argv[0] == "".

Let's avoid this by not invoking the release_agent when it is "".

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-03-04 11:53:33 -05:00
Vasily Averin
db8dd96972 cgroup-v1: cgroup_pidlist_next should update position index
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.

 # mount | grep cgroup
 # dd if=/mnt/cgroup.procs bs=1  # normal output
...
1294
1295
1296
1304
1382
584+0 records in
584+0 records out
584 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
83  <<< generates end of last line
1383  <<< ... and whole last line once again
0+1 records in
0+1 records out
8 bytes copied

dd: /mnt/cgroup.procs: cannot skip to specified offset
1386  <<< generates last line anyway
0+1 records in
0+1 records out
5 bytes copied

https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 16:53:35 -05:00
Greg Kroah-Hartman
aa601dde64 Merge c9d35ee049 ("Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs") into android-mainline
Tiny steps to deal with merge issues in sdcardfs due to fs param passing
api changes.

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I03ba8763e8cc324c25fb6316c363b59957103474
2020-02-10 08:39:09 -08:00
Al Viro
58c025f0e8 cgroup1: switch to use of errorfc() et.al.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:43 -05:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Al Viro
fbc2d1686d get rid of cg_invalf()
pointless alias for invalf()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:31 -05:00
Dmitry Torokhov
a88f616760 CHROMIUM: cgroups: relax permissions on moving tasks between cgroups
Android expects system_server to be able to move tasks between different
cgroups/cpusets, but does not want to be running as root. Let's relax
permission check so that processes can move other tasks if they have
CAP_SYS_NICE in the affected task's user namespace.

BUG=b:31790445,chromium:647994
Bug: 147109865
TEST=Boot android container, examine logcat

Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/394927
Reviewed-by: Ricky Zhou <rickyz@chromium.org>
[AmitP: Refactored original changes to align with upstream commit
        201af4c0fa ("cgroup: move cgroup files under kernel/cgroup/")]
Change-Id: Ia919c66ab6ed6a6daf7c4cf67feb38b13b1ad09b
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit ec54762b84a1d06de188bc846655305d3f7acf75)
2020-01-07 01:56:09 +00:00
Michal Koutný
9a3284fad4 cgroup: Optimize single thread migration
There are reports of users who use thread migrations between cgroups and
they report performance drop after d59cfc09c3 ("sched, cgroup: replace
signal_struct->group_rwsem with a global percpu_rwsem"). The effect is
pronounced on machines with more CPUs.

The migration is affected by forking noise happening in the background,
after the mentioned commit a migrating thread must wait for all
(forking) processes on the system, not only of its threadgroup.

There are several places that need to synchronize with migration:
	a) do_exit,
	b) de_thread,
	c) copy_process,
	d) cgroup_update_dfl_csses,
	e) parallel migration (cgroup_{proc,thread}s_write).

In the case of self-migrating thread, we relax the synchronization on
cgroup_threadgroup_rwsem to avoid the cost of waiting. d) and e) are
excluded with cgroup_mutex, c) does not matter in case of single thread
migration and the executing thread cannot exec(2) or exit(2) while it is
writing into cgroup.threads. In case of do_exit because of signal
delivery, we either exit before the migration or finish the migration
(of not yet PF_EXITING thread) and die afterwards.

This patch handles only the case of self-migration by writing "0" into
cgroup.threads. For simplicity, we always take cgroup_threadgroup_rwsem
with numeric PIDs.

This change improves migration dependent workload performance similar
to per-signal_struct state.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2019-10-07 07:11:53 -07:00
Marc Koderer
653a23ca7e Use kvmalloc in cgroups-v1
Instead of using its own logic for k-/vmalloc rely on
kvmalloc which is actually doing quite the same.

Signed-off-by: Marc Koderer <marc@koderer.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2019-08-07 11:37:58 -07:00
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Roman Gushchin
aade7f9efb cgroup: implement __cgroup_task_count() helper
The helper is identical to the existing cgroup_task_count()
except it doesn't take the css_set_lock by itself, assuming
that the caller does.

Also, move cgroup_task_count() implementation into
kernel/cgroup/cgroup.c, as there is nothing specific to cgroup v1.

Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kernel-team@fb.com
2019-04-19 11:26:48 -07:00
David Howells
06a2ae56b5 vfs: Add some logging to the core users of the fs_context log
Add some logging to the core users of the fs_context log so that
information can be extracted from them as to the reason for failure.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:38 -05:00
Al Viro
cca8f32714 cgroup: store a reference to cgroup_ns into cgroup_fs_context
... and trim cgroup_do_mount() arguments (renaming it to cgroup_do_get_tree())

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:34 -05:00
Al Viro
6678889f07 cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:33 -05:00