Commit Graph

595 Commits

Author SHA1 Message Date
bengris32
bbec2c993e [SQUASH] treewide: Remove useless oplus bloat
* Drop a BUNCH of useless oplus drivers, "features", loggers
  and other useless mods.

Signed-off-by: bengris32 <bengris32@protonmail.ch>
Change-Id: I5133459af68c9cabbbfe729c417e5d903375413a
2023-05-06 22:33:38 +01:00
bengris32
f48edc966e kernel/: Import Realme changes
* From https://github.com/realme-kernel-opensource/realme7_realme8_Narzo30_Narzo20pro-AndroidS-kernel-source

Signed-off-by: bengris32 <bengris32@protonmail.ch>
Change-Id: I5c2fbdc50009ce5a48c46e0e7ada0a1ab608673b
2023-01-08 12:35:36 +00:00
Alistair Delva
c5d2219efd ANDROID: Fix wq fp check for CFI builds
A previous change added a test on the wrong config flag; rename
CFI to CFI_CLANG.

Bug: 145210207
Change-Id: Id8aead2eb2c75ad6442d10165f6cb86ccfb9c2f9
Signed-off-by: Alistair Delva <adelva@google.com>
2020-04-02 21:48:26 +00:00
Greg Kroah-Hartman
32bc956bc2 Merge 4.14.174 into android-4.14
Changes in 4.14.174
	phy: Revert toggling reset changes.
	net: phy: Avoid multiple suspends
	cgroup, netclassid: periodically release file_lock on classid updating
	gre: fix uninit-value in __iptunnel_pull_header
	ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
	ipvlan: add cond_resched_rcu() while processing muticast backlog
	ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
	netlink: Use netlink header as base to calculate bad attribute offset
	net: macsec: update SCI upon MAC address change.
	net: nfc: fix bounds checking bugs on "pipe"
	net/packet: tpacket_rcv: do not increment ring index on drop
	r8152: check disconnect status after long sleep
	sfc: detach from cb_page in efx_copy_channel()
	bnxt_en: reinitialize IRQs when MTU is modified
	cgroup: memcg: net: do not associate sock with unrelated cgroup
	net: memcg: late association of sock to memcg
	net: memcg: fix lockdep splat in inet_csk_accept()
	fib: add missing attribute validation for tun_id
	nl802154: add missing attribute validation
	nl802154: add missing attribute validation for dev_type
	can: add missing attribute validation for termination
	macsec: add missing attribute validation for port
	net: fq: add missing attribute validation for orphan mask
	team: add missing attribute validation for port ifindex
	team: add missing attribute validation for array index
	nfc: add missing attribute validation for SE API
	nfc: add missing attribute validation for vendor subcommand
	net: phy: fix MDIO bus PM PHY resuming
	bonding/alb: make sure arp header is pulled before accessing it
	slip: make slhc_compress() more robust against malicious packets
	net: fec: validate the new settings in fec_enet_set_coalesce()
	macvlan: add cond_resched() during multicast processing
	inet_diag: return classid for all socket types
	ipvlan: do not add hardware address of master to its unicast filter list
	ipvlan: egress mcast packets are not exceptional
	ipvlan: don't deref eth hdr before checking it's set
	cgroup: cgroup_procs_next should increase position index
	cgroup: Iterate tasks that did not finish do_exit()
	iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices
	virtio-blk: fix hw_queue stopped on arbitrary error
	iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
	workqueue: don't use wq_select_unbound_cpu() for bound works
	drm/amd/display: remove duplicated assignment to grph_obj_type
	ktest: Add timeout for ssh sync testing
	cifs_atomic_open(): fix double-put on late allocation failure
	gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
	KVM: x86: clear stale x86_emulate_ctxt->intercept value
	ARC: define __ALIGN_STR and __ALIGN symbols for ARC
	efi: Fix a race and a buffer overflow while reading efivars via sysfs
	x86/mce: Fix logic and comments around MSR_PPIN_CTL
	iommu/dma: Fix MSI reservation allocation
	iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
	iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
	pinctrl: meson-gxl: fix GPIOX sdio pins
	pinctrl: core: Remove extra kref_get which blocks hogs being freed
	nl80211: add missing attribute validation for critical protocol indication
	nl80211: add missing attribute validation for beacon report scanning
	nl80211: add missing attribute validation for channel switch
	netfilter: cthelper: add missing attribute validation for cthelper
	netfilter: nft_payload: add missing attribute validation for payload csum flags
	iommu/vt-d: Fix the wrong printing in RHSA parsing
	iommu/vt-d: Ignore devices with out-of-spec domain number
	i2c: acpi: put device when verifying client fails
	ipv6: restrict IPV6_ADDRFORM operation
	net/smc: check for valid ib_client_data
	efi: Add a sanity check to efivar_store_raw()
	batman-adv: Avoid spurious warnings from bat_v neigh_cmp implementation
	batman-adv: Always initialize fragment header priority
	batman-adv: Fix check of retrieved orig_gw in batadv_v_gw_is_eligible
	batman-adv: Fix lock for ogm cnt access in batadv_iv_ogm_calc_tq
	batman-adv: Fix internal interface indices types
	batman-adv: update data pointers after skb_cow()
	batman-adv: Avoid race in TT TVLV allocator helper
	batman-adv: Fix TT sync flags for intermediate TT responses
	batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
	batman-adv: Fix debugfs path for renamed hardif
	batman-adv: Fix debugfs path for renamed softif
	batman-adv: Fix duplicated OGMs on NETDEV_UP
	batman-adv: Avoid free/alloc race when handling OGM2 buffer
	batman-adv: Avoid free/alloc race when handling OGM buffer
	batman-adv: Don't schedule OGM for disabled interface
	perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
	ACPI: watchdog: Allow disabling WDAT at boot
	HID: apple: Add support for recent firmware on Magic Keyboards
	HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
	cfg80211: check reg_rule for NULL in handle_channel_custom()
	scsi: libfc: free response frame from GPN_ID
	net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch
	net: ks8851-ml: Fix IRQ handling and locking
	mac80211: rx: avoid RCU list traversal under mutex
	signal: avoid double atomic counter increments for user accounting
	slip: not call free_netdev before rtnl_unlock in slip_open
	hinic: fix a bug of setting hw_ioctxt
	net: rmnet: fix NULL pointer dereference in rmnet_newlink()
	jbd2: fix data races at struct journal_head
	ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()
	ARM: 8958/1: rename missed uaccess .fixup section
	mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
	ipv4: ensure rcu_read_lock() in cipso_v4_error()
	Linux 4.14.174

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8594f155b5b3df71510fdb5dc034c80fb2332c91
2020-03-20 12:19:46 +01:00
Hillf Danton
48c336253b workqueue: don't use wq_select_unbound_cpu() for bound works
commit aa202f1f56960c60e7befaa0f49c72b8fa11b0a8 upstream.

wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.

Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.

Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement.  So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.

Fixes: ef55718044 ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <hdanton@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20 10:54:16 +01:00
Sami Tolvanen
304407a616 ANDROID: Disable wq fp check in CFI builds
With non-canonical CFI, LLVM generates jump table entries for external
symbols in modules and as a result, a function pointer passed from a
module to the core kernel will have a different address.

Disable the warning for now.

Bug: 145210207
Change-Id: Ifdcee3479280f7b97abdee6b4c746f447e0944e6
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Alistair Delva <adelva@google.com>
2020-02-27 00:14:00 +00:00
Greg Kroah-Hartman
f960b38ecc Merge 4.14.159 into android-4.14
Changes in 4.14.159
	rsi: release skb if rsi_prepare_beacon fails
	arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
	usb: gadget: u_serial: add missing port entry locking
	tty: serial: fsl_lpuart: use the sg count from dma_map_sg
	tty: serial: msm_serial: Fix flow control
	serial: pl011: Fix DMA ->flush_buffer()
	serial: serial_core: Perform NULL checks for break_ctl ops
	serial: ifx6x60: add missed pm_runtime_disable
	autofs: fix a leak in autofs_expire_indirect()
	RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
	iwlwifi: pcie: don't consider IV len in A-MSDU
	exportfs_decode_fh(): negative pinned may become positive without the parent locked
	audit_get_nd(): don't unlock parent too early
	NFC: nxp-nci: Fix NULL pointer dereference after I2C communication error
	xfrm: release device reference for invalid state
	Input: cyttsp4_core - fix use after free bug
	sched/core: Avoid spurious lock dependencies
	ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
	rsxx: add missed destroy_workqueue calls in remove
	net: ep93xx_eth: fix mismatch of request_mem_region in remove
	i2c: core: fix use after free in of_i2c_notify
	serial: core: Allow processing sysrq at port unlock time
	cxgb4vf: fix memleak in mac_hlist initialization
	iwlwifi: mvm: synchronize TID queue removal
	iwlwifi: mvm: Send non offchannel traffic via AP sta
	ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
	net/mlx5: Release resource on error flow
	clk: sunxi-ng: a64: Fix gate bit of DSI DPHY
	dlm: fix possible call to kfree() for non-initialized pointer
	extcon: max8997: Fix lack of path setting in USB device mode
	net: ethernet: ti: cpts: correct debug for expired txq skb
	rtc: s3c-rtc: Avoid using broken ALMYEAR register
	i40e: don't restart nway if autoneg not supported
	clk: rockchip: fix rk3188 sclk_smc gate data
	clk: rockchip: fix rk3188 sclk_mac_lbtest parameter ordering
	ARM: dts: rockchip: Fix rk3288-rock2 vcc_flash name
	dlm: fix missing idr_destroy for recover_idr
	MIPS: SiByte: Enable ZONE_DMA32 for LittleSur
	net: dsa: mv88e6xxx: Work around mv886e6161 SERDES missing MII_PHYSID2
	scsi: zfcp: drop default switch case which might paper over missing case
	crypto: ecc - check for invalid values in the key verification test
	crypto: bcm - fix normal/non key hash algorithm failure
	pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues
	Staging: iio: adt7316: Fix i2c data reading, set the data field
	mm/vmstat.c: fix NUMA statistics updates
	clk: rockchip: fix I2S1 clock gate register for rk3328
	clk: rockchip: fix ID of 8ch clock of I2S1 for rk3328
	regulator: Fix return value of _set_load() stub
	net-next/hinic:fix a bug in set mac address
	iomap: sub-block dio needs to zeroout beyond EOF
	MIPS: OCTEON: octeon-platform: fix typing
	net/smc: use after free fix in smc_wr_tx_put_slot()
	math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning
	rtc: max8997: Fix the returned value in case of error in 'max8997_rtc_read_alarm()'
	rtc: dt-binding: abx80x: fix resistance scale
	ARM: dts: exynos: Use Samsung SoC specific compatible for DWC2 module
	media: pulse8-cec: return 0 when invalidating the logical address
	media: cec: report Vendor ID after initialization
	dmaengine: coh901318: Fix a double-lock bug
	dmaengine: coh901318: Remove unused variable
	dmaengine: dw-dmac: implement dma protection control setting
	usb: dwc3: debugfs: Properly print/set link state for HS
	usb: dwc3: don't log probe deferrals; but do log other error codes
	ACPI: fix acpi_find_child_device() invocation in acpi_preset_companion()
	f2fs: fix count of seg_freed to make sec_freed correct
	f2fs: change segment to section in f2fs_ioc_gc_range
	ARM: dts: rockchip: Fix the PMU interrupt number for rv1108
	ARM: dts: rockchip: Assign the proper GPIO clocks for rv1108
	f2fs: fix to allow node segment for GC by ioctl path
	sparc: Correct ctx->saw_frame_pointer logic.
	dma-mapping: fix return type of dma_set_max_seg_size()
	altera-stapl: check for a null key before strcasecmp'ing it
	serial: imx: fix error handling in console_setup
	i2c: imx: don't print error message on probe defer
	lockd: fix decoding of TEST results
	ASoC: rsnd: tidyup registering method for rsnd_kctrl_new()
	ARM: dts: sun5i: a10s: Fix HDMI output DTC warning
	ARM: dts: sun8i: v3s: Change pinctrl nodes to avoid warning
	dlm: NULL check before kmem_cache_destroy is not needed
	ARM: debug: enable UART1 for socfpga Cyclone5
	nfsd: fix a warning in __cld_pipe_upcall()
	ASoC: au8540: use 64-bit arithmetic instead of 32-bit
	ARM: OMAP1/2: fix SoC name printing
	arm64: dts: meson-gxl-libretech-cc: fix GPIO lines names
	arm64: dts: meson-gxbb-nanopi-k2: fix GPIO lines names
	arm64: dts: meson-gxbb-odroidc2: fix GPIO lines names
	arm64: dts: meson-gxl-khadas-vim: fix GPIO lines names
	net/x25: fix called/calling length calculation in x25_parse_address_block
	net/x25: fix null_x25_address handling
	ARM: dts: mmp2: fix the gpio interrupt cell number
	ARM: dts: realview-pbx: Fix duplicate regulator nodes
	tcp: fix off-by-one bug on aborting window-probing socket
	tcp: fix SNMP under-estimation on failed retransmission
	tcp: fix SNMP TCP timeout under-estimation
	modpost: skip ELF local symbols during section mismatch check
	kbuild: fix single target build for external module
	mtd: fix mtd_oobavail() incoherent returned value
	ARM: dts: pxa: clean up USB controller nodes
	clk: sunxi-ng: h3/h5: Fix CSI_MCLK parent
	ARM: dts: realview: Fix some more duplicate regulator nodes
	dlm: fix invalid cluster name warning
	net/mlx4_core: Fix return codes of unsupported operations
	pstore/ram: Avoid NULL deref in ftrace merging failure path
	powerpc/math-emu: Update macros from GCC
	clk: renesas: r8a77995: Correct parent clock of DU
	MIPS: OCTEON: cvmx_pko_mem_debug8: use oldest forward compatible definition
	nfsd: Return EPERM, not EACCES, in some SETATTR cases
	tty: Don't block on IO when ldisc change is pending
	media: stkwebcam: Bugfix for wrong return values
	firmware: qcom: scm: fix compilation error when disabled
	mlxsw: spectrum_router: Relax GRE decap matching check
	IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state
	IB/hfi1: Close VNIC sdma_progress sleep window
	mlx4: Use snprintf instead of complicated strcpy
	usb: mtu3: fix dbginfo in qmu_tx_zlp_error_handler
	ARM: dts: sunxi: Fix PMU compatible strings
	media: vimc: fix start stream when link is disabled
	net: aquantia: fix RSS table and key sizes
	tcp: exit if nothing to retransmit on RTO timeout
	sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
	fuse: verify nlink
	fuse: verify attributes
	ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236
	ALSA: pcm: oss: Avoid potential buffer overflows
	ALSA: hda - Add mute led support for HP ProBook 645 G4
	Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
	Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
	Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
	Input: goodix - add upside-down quirk for Teclast X89 tablet
	coresight: etm4x: Fix input validation for sysfs.
	Input: Fix memory leak in psxpad_spi_probe
	x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect
	CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
	CIFS: Fix SMB2 oplock break processing
	tty: vt: keyboard: reject invalid keycodes
	can: slcan: Fix use-after-free Read in slcan_open
	kernfs: fix ino wrap-around detection
	jbd2: Fix possible overflow in jbd2_log_space_left()
	drm/i810: Prevent underflow in ioctl
	KVM: arm/arm64: vgic: Don't rely on the wrong pending table
	KVM: x86: do not modify masked bits of shared MSRs
	KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
	crypto: crypto4xx - fix double-free in crypto4xx_destroy_sdr
	crypto: af_alg - cast ki_complete ternary op to int
	crypto: ccp - fix uninitialized list head
	crypto: ecdh - fix big endian bug in ECC library
	crypto: user - fix memory leak in crypto_report
	spi: atmel: Fix CS high support
	RDMA/qib: Validate ->show()/store() callbacks before calling them
	iomap: Fix pipe page leakage during splicing
	thermal: Fix deadlock in thermal thermal_zone_device_check
	binder: Handle start==NULL in binder_update_page_range()
	ASoC: rsnd: fixup MIX kctrl registration
	KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
	appletalk: Fix potential NULL pointer dereference in unregister_snap_client
	appletalk: Set error code if register_snap_client failed
	usb: gadget: configfs: Fix missing spin_lock_init()
	usb: gadget: pch_udc: fix use after free
	scsi: qla2xxx: Fix driver unload hang
	media: venus: remove invalid compat_ioctl32 handler
	USB: uas: honor flag to avoid CAPACITY16
	USB: uas: heed CAPACITY_HEURISTICS
	USB: documentation: flags on usb-storage versus UAS
	usb: Allow USB device to be warm reset in suspended state
	staging: rtl8188eu: fix interface sanity check
	staging: rtl8712: fix interface sanity check
	staging: gigaset: fix general protection fault on probe
	staging: gigaset: fix illegal free on probe errors
	staging: gigaset: add endpoint-type sanity check
	usb: xhci: only set D3hot for pci device
	xhci: Increase STS_HALT timeout in xhci_suspend()
	xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
	ARM: dts: pandora-common: define wl1251 as child node of mmc3
	iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
	USB: atm: ueagle-atm: add missing endpoint check
	USB: idmouse: fix interface sanity checks
	USB: serial: io_edgeport: fix epic endpoint lookup
	USB: adutux: fix interface sanity check
	usb: core: urb: fix URB structure initialization function
	usb: mon: Fix a deadlock in usbmon between mmap and read
	tpm: add check after commands attribs tab allocation
	mtd: spear_smi: Fix Write Burst mode
	virtio-balloon: fix managed page counts when migrating pages between zones
	usb: dwc3: ep0: Clear started flag on completion
	btrfs: check page->mapping when loading free space cache
	btrfs: use refcount_inc_not_zero in kill_all_nodes
	Btrfs: fix negative subv_writers counter and data space leak after buffered write
	btrfs: Remove btrfs_bio::flags member
	Btrfs: send, skip backreference walking for extents with many references
	btrfs: record all roots for rename exchange on a subvol
	rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
	rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
	rtlwifi: rtl8192de: Fix missing enable interrupt flag
	lib: raid6: fix awk build warnings
	ovl: relax WARN_ON() on rename to self
	ALSA: hda - Fix pending unsol events at shutdown
	md/raid0: Fix an error message in raid0_make_request()
	watchdog: aspeed: Fix clock behaviour for ast2600
	hwrng: omap - Fix RNG wait loop timeout
	dm zoned: reduce overhead of backing device checks
	workqueue: Fix spurious sanity check failures in destroy_workqueue()
	workqueue: Fix pwq ref leak in rescuer_thread()
	ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
	blk-mq: avoid sysfs buffer overflow with too many CPU cores
	cgroup: pids: use atomic64_t for pids->limit
	ar5523: check NULL before memcpy() in ar5523_cmd()
	s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
	media: bdisp: fix memleak on release
	media: radio: wl1273: fix interrupt masking on release
	media: cec.h: CEC_OP_REC_FLAG_ values were swapped
	cpuidle: Do not unset the driver if it is there already
	intel_th: Fix a double put_device() in error path
	intel_th: pci: Add Ice Lake CPU support
	intel_th: pci: Add Tiger Lake CPU support
	PM / devfreq: Lock devfreq in trans_stat_show
	cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
	ACPI: OSL: only free map once in osl.c
	ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
	ACPI: PM: Avoid attaching ACPI PM domain to certain devices
	pinctrl: samsung: Add of_node_put() before return in error path
	pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
	pinctrl: samsung: Fix device node refcount leaks in init code
	pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
	mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
	ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
	ppdev: fix PPGETTIME/PPSETTIME ioctls
	powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
	powerpc/xive: Prevent page fault issues in the machine crash handler
	powerpc: Allow flush_icache_range to work across ranges >4GB
	powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
	video/hdmi: Fix AVI bar unpack
	quota: Check that quota is not dirty before release
	ext2: check err when partial != NULL
	quota: fix livelock in dquot_writeback_dquots
	ext4: Fix credit estimate for final inode freeing
	reiserfs: fix extended attributes on the root directory
	block: fix single range discard merge
	scsi: zfcp: trace channel log even for FCP command responses
	scsi: qla2xxx: Fix DMA unmap leak
	scsi: qla2xxx: Fix session lookup in qlt_abort_work()
	scsi: qla2xxx: Fix qla24xx_process_bidir_cmd()
	scsi: qla2xxx: Always check the qla2x00_wait_for_hba_online() return value
	scsi: qla2xxx: Fix message indicating vectors used by driver
	xhci: Fix memory leak in xhci_add_in_port()
	xhci: make sure interrupts are restored to correct state
	iio: adis16480: Add debugfs_reg_access entry
	phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
	omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
	scsi: lpfc: Cap NPIV vports to 256
	scsi: lpfc: Correct code setting non existent bits in sli4 ABORT WQE
	drbd: Change drbd_request_detach_interruptible's return type to int
	e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait
	x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
	x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
	power: supply: cpcap-battery: Fix signed counter sample register
	mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead
	media: vimc: fix component match compare
	ath10k: fix fw crash by moving chip reset after napi disabled
	powerpc: Avoid clang warnings around setjmp and longjmp
	powerpc: Fix vDSO clock_getres()
	ext4: work around deleting a file with i_nlink == 0 safely
	firmware: qcom: scm: Ensure 'a0' status code is treated as signed
	mm/shmem.c: cast the type of unmap_start to u64
	ext4: fix a bug in ext4_wait_for_tail_page_commit
	mfd: rk808: Fix RK818 ID template
	blk-mq: make sure that line break can be printed
	workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
	sunrpc: fix crash when cache_head become valid before update
	net/mlx5e: Fix SFF 8472 eeprom length
	gfs2: fix glock reference problem in gfs2_trans_remove_revoke
	kernel/module.c: wakeup processes in module_wq on module unload
	gpiolib: acpi: Add Terra Pad 1061 to the run_edge_events_on_boot_blacklist
	raid5: need to set STRIPE_HANDLE for batch head
	of: unittest: fix memory leak in attach_node_and_children
	Linux 4.14.159

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-12-17 21:13:36 +01:00
Tejun Heo
80797bdcc5 workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
commit 8efe1223d73c218ce7e8b2e0e9aadb974b582d7f upstream.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Qian Cai <cai@lca.pw>
Fixes: def98c84b6cd ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:59 +01:00
Tejun Heo
9e0f33da90 workqueue: Fix pwq ref leak in rescuer_thread()
commit e66b39af00f426b3356b96433d620cb3367ba1ff upstream.

008847f66c ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration.  Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list.  This doesn't corrupt the list but creates an
additional reference to the pwq.  It got queued twice but will only be
removed once.

This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:26 +01:00
Tejun Heo
05905c2f21 workqueue: Fix spurious sanity check failures in destroy_workqueue()
commit def98c84b6cdf2eeea19ec5736e90e316df5206b upstream.

Before actually destrying a workqueue, destroy_workqueue() checks
whether it's actually idle.  If it isn't, it prints out a bunch of
warning messages and leaves the workqueue dangling.  It unfortunately
has a couple issues.

* Mayday list queueing increments pwq's refcnts which gets detected as
  busy and fails the sanity checks.  However, because mayday list
  queueing is asynchronous, this condition can happen without any
  actual work items left in the workqueue.

* Sanity check failure leaves the sysfs interface behind too which can
  lead to init failure of newer instances of the workqueue.

This patch fixes the above two by

* If a workqueue has a rescuer, disable and kill the rescuer before
  sanity checks.  Disabling and killing is guaranteed to flush the
  existing mayday list.

* Remove sysfs interface before sanity checks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marcin Pawlowski <mpawlowski@fb.com>
Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-17 20:39:25 +01:00
Johannes Weiner
094ac9419e UPSTREAM: psi: fix aggregation idle shut-off
psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating.  However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again.  This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep.  To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself.  If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity.  But that
should not be a problem.  The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation.  If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 1b69ac6b40ebd85eed73e4dbccde2a36961ab990)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I2877fec3d381b1006b8bd1261895fdfd68bd21db
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:19:56 -07:00
Vincent Whitchurch
63a0f9de02 watchdog: Mark watchdog touch functions as notrace
commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 upstream.

Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.

Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations.  This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.

Fix it by adding the necessary notrace annotations.

Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05 09:26:42 +02:00
Arvind Yadav
7c84e5e9c6 workqueue: use put_device() instead of kfree()
[ Upstream commit 537f4146c53c95aac977852b371bafb9c6755ee1 ]

Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized in this function instead.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:52:14 +02:00
Lukas Wunner
363e3fd5fa workqueue: Allow retrieval of current task's work struct
commit 27d4ee03078aba88c5e07dcc4917e8d01d046f38 upstream.

Introduce a helper to retrieve the current task's work struct if it is
a workqueue worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for a specific worker to
finish and that worker in turn calls a function which waits for runtime
suspend to finish.  That function is invoked from multiple call sites
and waiting for runtime suspend to finish is the correct thing to do
except if it's executing in the context of the worker.

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-15 10:54:29 +01:00
Sergey Senozhatsky
a6d5930ccf workqueue: avoid hard lockups in show_workqueue_state()
commit 62635ea8c18f0f62df4cc58379e4f1d33afd5801 upstream.

show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-23 19:58:18 +01:00
Tejun Heo
692b48258d workqueue: replace pool->manager_arb mutex with a flag
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:

 [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
 [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
 [ 1270.473240] -----------------------------------------------------
 [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
 [ 1270.474994]
 [ 1270.474994] and this task is already holding:
 [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
 [ 1270.476046] which would create a new lock dependency:
 [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
 [ 1270.476949]
 [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
 [ 1270.477553]  (&pool->lock/1){-.-.}
 ...
 [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
 [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
 ...
 [ 1270.494735]  Possible interrupt unsafe locking scenario:
 [ 1270.494735]
 [ 1270.495250]        CPU0                    CPU1
 [ 1270.495600]        ----                    ----
 [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
 [ 1270.496295]                                local_irq_disable();
 [ 1270.496753]                                lock(&pool->lock/1);
 [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
 [ 1270.497744]   <Interrupt>
 [ 1270.497948]     lock(&pool->lock/1);

, which will cause a irq inversion deadlock if the above lock scenario
happens.

The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.

Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f ("locking/mutex: Fix
lockdep_assert_held() fail").

Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag.  The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.

This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.

v2: Drop unnecessary flag clearing before pool destruction as
    suggested by Boqun.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org
2017-10-10 07:13:57 -07:00
Linus Torvalds
9954d4892a Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing major. I introduced a flag collsion bug during v4.13 cycle
  which is fixed in this pull request. Fortunately, the flag is for
  debugging / verification and the bug isn't critical"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix flag collision
  workqueue: Use TASK_IDLE
  workqueue: fix path to documentation
  workqueue: doc change for ST behavior on NUMA systems
2017-09-06 21:59:31 -07:00
Tejun Heo
058fc47ee2 Merge branch 'for-4.13-fixes' into for-4.14 2017-09-05 06:33:41 -07:00
Peter Zijlstra
f52be57080 locking/lockdep: Untangle xhlock history save/restore from task independence
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
ensure the temporal IRQ events don't interact with task state, the
XHLOCK_PROC is a fundament different beast that just happens to share
the interface.

The purpose of XHLOCK_PROC is to annotate independent execution inside
one task. For example workqueues, each work should appear to run in its
own 'pristine' 'task'.

Remove XHLOCK_PROC in favour of its own interface to avoid confusion.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: kernel-team@lge.com
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:14:38 +02:00
Peter Zijlstra
e6f3faa734 locking/lockdep: Fix workqueue crossrelease annotation
The new completion/crossrelease annotations interact unfavourable with
the extant flush_work()/flush_workqueue() annotations.

The problem is that when a single work class does:

  wait_for_completion(&C)

and

  complete(&C)

in different executions, we'll build dependencies like:

  lock_map_acquire(W)
  complete_acquire(C)

and

  lock_map_acquire(W)
  complete_release(C)

which results in the dependency chain: W->C->W, which lockdep thinks
spells deadlock, even though there is no deadlock potential since
works are ran concurrently.

One possibility would be to change the work 'lock' to recursive-read,
but that would mean hitting a lockdep limitation on recursive locks.
Also, unconditinoally switching to recursive-read here would fail to
detect the actual deadlock on single-threaded workqueues, which do
have a problem with this.

For now, forcefully disregard these locks for crossrelease.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:06:33 +02:00
Peter Zijlstra
a1d14934ea workqueue/lockdep: 'Fix' flush_work() annotation
The flush_work() annotation as introduced by commit:

  e159489baa ("workqueue: relax lockdep annotation on flush_work()")

hits on the lockdep problem with recursive read locks.

The situation as described is:

Work W1:                Work W2:        Task:

ARR(Q)                  ARR(Q)		flush_workqueue(Q)
A(W1)                   A(W2)             A(Q)
  flush_work(W2)			  R(Q)
    A(W2)
    R(W2)
    if (special)
      A(Q)
    else
      ARR(Q)
    R(Q)

where: A - acquire, ARR - acquire-read-recursive, R - release.

Where under 'special' conditions we want to trigger a lock recursion
deadlock, but otherwise allow the flush_work(). The allowing is done
by using recursive read locks (ARR), but lockdep is broken for
recursive stuff.

However, there appears to be no need to acquire the lock if we're not
'special', so if we remove the 'else' clause things become much
simpler and no longer need the recursion thing at all.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:06:32 +02:00
Peter Zijlstra
c5a94a618e workqueue: Use TASK_IDLE
Workqueues don't use signals, it (ab)uses TASK_INTERRUPTIBLE to avoid
increasing the loadavg numbers. We've 'recently' introduced TASK_IDLE
for this case:

  80ed87c8a9 ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")

use it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-23 06:30:35 -07:00
Boqun Feng
52fa5bc5cb locking/lockdep: Explicitly initialize wq_barrier::done::map
With the new lockdep crossrelease feature, which checks completions usage,
a false positive is reported in the workqueue code:

> Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> Task   B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> Task   C : acquired of lock#3 -> wait for completion of barr->done
> (Task C is in lru_add_drain_all_cpuslocked())
> Worker D : wait for wfc.work to be released -> will complete barr->done

Such a dead lock can not happen because Task C's barr->done and Worker D's
barr->done can not be the same instance.

The reason of this false positive is we initialize all wq_barrier::done
at insert_wq_barrier() via init_completion(), which makes them belong to
the same lock class, therefore, impossible circles are reported.

To fix this, explicitly initialize the lockdep map for wq_barrier::done
in insert_wq_barrier(), so that the lock class key of wq_barrier::done
is a subkey of the corresponding work_struct, as a result we won't build
a dependency between a wq_barrier with a unrelated work, and we can
differ wq barriers based on the related works, so the false positive
above is avoided.

Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE
to make the code simple and away from unnecessary #ifdefs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 12:12:33 +02:00
Byungchul Park
b09be676e0 locking/lockdep: Implement the 'crossrelease' feature
Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.

So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:07 +02:00
Benjamin Peterson
9a2614916a workqueue: fix path to documentation
Signed-off-by: Benjamin Peterson <bp@benjamin.pe>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-07 08:03:24 -07:00
Michael Bringmann
1ad0f0a7aa workqueue: Work around edge cases for calc of pool's cpumask
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static.  This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory.  PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.

Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules.  In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.

This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools.  It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code.  It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.

[With additions to the patch provided by Tejun Hao <tj@kernel.org>]

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-28 11:05:52 -04:00
Tejun Heo
0a94efb5ac workqueue: implicit ordered attribute should be overridable
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1.  Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.

This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
2017-07-25 13:28:56 -04:00
Tejun Heo
5c0338c687 workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution.  After NUMA affinity 4c16bd327c ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues.  Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: Alexei Potashnik <alexei@purestorage.com>
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
Cc: stable@vger.kernel.org # v3.10+
2017-07-19 11:24:19 -04:00
Ingo Molnar
ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Linus Torvalds
3527d3e951 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - another round of rq-clock handling debugging, robustization and
     fixes

   - PELT accounting improvements

   - CPU hotplug related ->cpus_allowed affinity handling fixes all
     around the tree

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  sched/x86: Update reschedule warning text
  crypto: N2 - Replace racy task affinity logic
  cpufreq/sparc-us2e: Replace racy task affinity logic
  cpufreq/sparc-us3: Replace racy task affinity logic
  cpufreq/sh: Replace racy task affinity logic
  cpufreq/ia64: Replace racy task affinity logic
  ACPI/processor: Replace racy task affinity logic
  ACPI/processor: Fix error handling in __acpi_processor_start()
  sparc/sysfs: Replace racy task affinity logic
  powerpc/smp: Replace open coded task affinity logic
  ia64/sn/hwperf: Replace racy task affinity logic
  ia64/salinfo: Replace racy task affinity logic
  workqueue: Provide work_on_cpu_safe()
  ia64/topology: Remove cpus_allowed manipulation
  sched/fair: Move the PELT constants into a generated header
  sched/fair: Increase PELT accuracy for small tasks
  sched/fair: Fix comments
  sched/Documentation: Add 'sched-pelt' tool
  sched/fair: Fix corner case in __accumulate_sum()
  sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
  ...
2017-05-01 19:12:53 -07:00
Linus Torvalds
ad1490bcd2 Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "One trivial patch to use setup_deferrable_timer() instead of
  open-coding the initialization"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: use setup_deferrable_timer
2017-05-01 13:49:27 -07:00
Thomas Gleixner
0e8d6a9336 workqueue: Provide work_on_cpu_safe()
work_on_cpu() is not protected against CPU hotplug. For code which requires
to be either executed on an online CPU or to fail if the CPU is not
available the callsite would have to protect against CPU hotplug.

Provide a function which does get/put_online_cpus() around the call to
work_on_cpu() and fails the call with -ENODEV if the target CPU is not
online.

Preparatory patch to convert several racy task affinity manipulations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:53 +02:00
Geliang Tang
c30fb26b11 workqueue: use setup_deferrable_timer
Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-06 15:42:20 -05:00
Tejun Heo
637fdbae60 workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-06 15:33:42 -05:00
Kees Cook
dfb4357da6 time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:08 +01:00
Tejun Heo
8bc4a04455 Merge branch 'for-4.9' into for-4.10 2016-10-19 12:12:40 -04:00
Tejun Heo
2186d9f940 workqueue: move wq_numa_init() to workqueue_init()
While splitting up workqueue initialization into two parts,
ac8f73400782 ("workqueue: make workqueue available early during boot")
put wq_numa_init() into workqueue_init_early().  Unfortunately, on
some archs including power and arm64, cpu to node mapping isn't yet
established by the time the early init is called leading to incorrect
NUMA initialization and subsequently the following oops due to zero
cpumask on node-specific unbound pools.

  Unable to handle kernel paging request for data at address 0x00000038
  Faulting instruction address: 0xc0000000000fc0cc
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
  task: c0000007f5400000 task.stack: c000001ffc084000
  NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
  REGS: c000001ffc087780 TRAP: 0300   Not tainted  (4.8.0-compiler_gcc-6.2.0-next-20161005)
  MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48000424  XER: 00000000
  CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
  GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
  GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
  GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
  GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
  GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
  GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
  GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
  NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
  LR [c0000000000ed928] activate_task+0x78/0xe0
  Call Trace:
  [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
  [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
  [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
  [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
  [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
  [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
  [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
  [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
  [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
  Instruction dump:
  62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
  60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58
  ---[ end trace 0000000000000000 ]---

Fix it by moving wq_numa_init() to workqueue_init().  As this means
that the early intialization may not have full NUMA info for per-cpu
pools and ignores NUMA affinity for unbound pools, fix them up from
workqueue_init() after wq_numa_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au
Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot")
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-10-19 12:12:26 -04:00
Petr Mladek
e700591ae0 kthread: rename probe_kthread_data() to kthread_probe_data()
Patch series "kthread: Kthread worker API improvements"

The intention of this patchset is to make it easier to manipulate and
maintain kthreads.  Especially, I want to replace all the custom main
cycles with a generic one.  Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.

This patch (of 11):

A good practice is to prefix the names of functions by the name of the
subsystem.

This patch fixes the name of probe_kthread_data().  The other wrong
functions names are part of the kthread worker API and will be fixed
separately.

Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Tejun Heo
863b710b66 workqueue: remove keventd_up()
keventd_up() no longer has in-kernel users.  Remove it and make
wq_online static.

Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-17 13:18:21 -04:00
Tejun Heo
3347fa0928 workqueue: make workqueue available early during boot
Workqueue is currently initialized in an early init call; however,
there are cases where early boot code has to be split and reordered to
come after workqueue initialization or the same code path which makes
use of workqueues is used both before workqueue initailization and
after.  The latter cases have to gate workqueue usages with
keventd_up() tests, which is nasty and easy to get wrong.

Workqueue usages have become widespread and it'd be a lot more
convenient if it can be used very early from boot.  This patch splits
workqueue initialization into two steps.  workqueue_init_early() which
sets up the basic data structures so that workqueues can be created
and work items queued, and workqueue_init() which actually brings up
workqueues online and starts executing queued work items.  The former
step can be done very early during boot once memory allocation,
cpumasks and idr are initialized.  The latter right after kthreads
become available.

This allows work item queueing and canceling from very early boot
which is what most of these use cases want.

* As systemd_wq being initialized doesn't indicate that workqueue is
  fully online anymore, update keventd_up() to test wq_online instead.
  The follow-up patches will get rid of all its usages and the
  function itself.

* Flushing doesn't make sense before workqueue is fully initialized.
  The flush functions trigger WARN and return immediately before fully
  online.

* Work items are never in-flight before fully online.  Canceling can
  always succeed by skipping the flush step.

* Some code paths can no longer assume to be called with irq enabled
  as irq is disabled during early boot.  Use irqsave/restore
  operations instead.

v2: Watchdog init, which requires timer to be running, moved from
    workqueue_init_early() to workqueue_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
2016-09-17 13:18:21 -04:00
Tejun Heo
fa07fb6a4e workqueue: dump workqueue state on sanity check failures in destroy_workqueue()
destroy_workqueue() performs a number of sanity checks to ensure that
the workqueue is empty before proceeding with destruction.  However,
it's not always easy to tell what's going on just from the warning
message.  Let's dump workqueue state after sanity check failures to
help debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
2016-09-16 11:08:39 -04:00
Jens Axboe
f72b8792d1 workqueue: add cancel_work()
Like cancel_delayed_work(), but for regular work.

Signed-off-by: Jens Axboe <axboe@fb.com>
Mehed-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
2016-08-29 08:13:21 -06:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Rafael J. Wysocki
7f234a4d8a Merge branches 'pm-sleep' and 'pm-tools'
* pm-sleep:
  PM / hibernate: Introduce test_resume mode for hibernation
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  PM / hibernate: Recycle safe pages after image restoration
  PM / hibernate: Simplify mark_unsafe_pages()
  PM / hibernate: Do not free preallocated safe pages during image restore
  PM / suspend: show workqueue state in suspend flow
  PM / sleep: make PM notifiers called symmetrically
  PM / sleep: Make pm_prepare_console() return void
  PM / Hibernate: Don't let kasan instrument snapshot.c

* pm-tools:
  PM / tools: scripts: AnalyzeSuspend v4.2
  tools/turbostat: allow user to alter DESTDIR and PREFIX
2016-07-25 13:44:32 +02:00
Thomas Gleixner
7ee681b252 workqueue: Convert to state machine callbacks
Get rid of the prio ordering of the separate notifiers and use a proper state
callback pair.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153335.197083890@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:43 +02:00
Roger Lu
7b776af66d PM / suspend: show workqueue state in suspend flow
If freezable workqueue aborts suspend flow, show
workqueue state for debug purpose.

Signed-off-by: Roger Lu <roger.lu@mediatek.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-02 01:42:48 +02:00
Peter Zijlstra
d945b5e9f0 workqueue: Fix setting affinity of unbound worker threads
With commit e9d867a67f ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.

This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.

------------[ cut here ]------------
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
<..snip..>
Call Trace:
[c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
[c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
[c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
[c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
[c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
[c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
[c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
[c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
[c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
[c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---

This patch fixes this by limiting the mask to the intersection of
the pool affinity and online CPUs.

Changelog-cribbed-from: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-06-16 15:37:05 -04:00
Du, Changbin
b9fdac7f66 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker.  If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks.  Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not.  If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked.  Then Change
such object can overwrite someone's memory and cause unexpected
behaviour.  For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
  Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin
02a982a6ec workqueue: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
change (debugobjects: make fixup functions return bool instead of int)

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Linus Torvalds
da92223908 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
  DOWN_PREPARE which can trigger a WARN_ON() in workqueue.

  The bug has been there for a very long time.  It only triggers if CPU
  down fails at a specific point and I don't think it has adverse
  effects other than the warning messages.  The fix is very low impact"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix rebind bound workers warning
2016-05-13 16:16:51 -07:00