* Drop a BUNCH of useless oplus drivers, "features", loggers
and other useless mods.
Signed-off-by: bengris32 <bengris32@protonmail.ch>
Change-Id: I5133459af68c9cabbbfe729c417e5d903375413a
A previous change added a test on the wrong config flag; rename
CFI to CFI_CLANG.
Bug: 145210207
Change-Id: Id8aead2eb2c75ad6442d10165f6cb86ccfb9c2f9
Signed-off-by: Alistair Delva <adelva@google.com>
Changes in 4.14.174
phy: Revert toggling reset changes.
net: phy: Avoid multiple suspends
cgroup, netclassid: periodically release file_lock on classid updating
gre: fix uninit-value in __iptunnel_pull_header
ipv6/addrconf: call ipv6_mc_up() for non-Ethernet interface
ipvlan: add cond_resched_rcu() while processing muticast backlog
ipvlan: do not use cond_resched_rcu() in ipvlan_process_multicast()
netlink: Use netlink header as base to calculate bad attribute offset
net: macsec: update SCI upon MAC address change.
net: nfc: fix bounds checking bugs on "pipe"
net/packet: tpacket_rcv: do not increment ring index on drop
r8152: check disconnect status after long sleep
sfc: detach from cb_page in efx_copy_channel()
bnxt_en: reinitialize IRQs when MTU is modified
cgroup: memcg: net: do not associate sock with unrelated cgroup
net: memcg: late association of sock to memcg
net: memcg: fix lockdep splat in inet_csk_accept()
fib: add missing attribute validation for tun_id
nl802154: add missing attribute validation
nl802154: add missing attribute validation for dev_type
can: add missing attribute validation for termination
macsec: add missing attribute validation for port
net: fq: add missing attribute validation for orphan mask
team: add missing attribute validation for port ifindex
team: add missing attribute validation for array index
nfc: add missing attribute validation for SE API
nfc: add missing attribute validation for vendor subcommand
net: phy: fix MDIO bus PM PHY resuming
bonding/alb: make sure arp header is pulled before accessing it
slip: make slhc_compress() more robust against malicious packets
net: fec: validate the new settings in fec_enet_set_coalesce()
macvlan: add cond_resched() during multicast processing
inet_diag: return classid for all socket types
ipvlan: do not add hardware address of master to its unicast filter list
ipvlan: egress mcast packets are not exceptional
ipvlan: don't deref eth hdr before checking it's set
cgroup: cgroup_procs_next should increase position index
cgroup: Iterate tasks that did not finish do_exit()
iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices
virtio-blk: fix hw_queue stopped on arbitrary error
iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
workqueue: don't use wq_select_unbound_cpu() for bound works
drm/amd/display: remove duplicated assignment to grph_obj_type
ktest: Add timeout for ssh sync testing
cifs_atomic_open(): fix double-put on late allocation failure
gfs2_atomic_open(): fix O_EXCL|O_CREAT handling on cold dcache
KVM: x86: clear stale x86_emulate_ctxt->intercept value
ARC: define __ALIGN_STR and __ALIGN symbols for ARC
efi: Fix a race and a buffer overflow while reading efivars via sysfs
x86/mce: Fix logic and comments around MSR_PPIN_CTL
iommu/dma: Fix MSI reservation allocation
iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
pinctrl: meson-gxl: fix GPIOX sdio pins
pinctrl: core: Remove extra kref_get which blocks hogs being freed
nl80211: add missing attribute validation for critical protocol indication
nl80211: add missing attribute validation for beacon report scanning
nl80211: add missing attribute validation for channel switch
netfilter: cthelper: add missing attribute validation for cthelper
netfilter: nft_payload: add missing attribute validation for payload csum flags
iommu/vt-d: Fix the wrong printing in RHSA parsing
iommu/vt-d: Ignore devices with out-of-spec domain number
i2c: acpi: put device when verifying client fails
ipv6: restrict IPV6_ADDRFORM operation
net/smc: check for valid ib_client_data
efi: Add a sanity check to efivar_store_raw()
batman-adv: Avoid spurious warnings from bat_v neigh_cmp implementation
batman-adv: Always initialize fragment header priority
batman-adv: Fix check of retrieved orig_gw in batadv_v_gw_is_eligible
batman-adv: Fix lock for ogm cnt access in batadv_iv_ogm_calc_tq
batman-adv: Fix internal interface indices types
batman-adv: update data pointers after skb_cow()
batman-adv: Avoid race in TT TVLV allocator helper
batman-adv: Fix TT sync flags for intermediate TT responses
batman-adv: prevent TT request storms by not sending inconsistent TT TLVLs
batman-adv: Fix debugfs path for renamed hardif
batman-adv: Fix debugfs path for renamed softif
batman-adv: Fix duplicated OGMs on NETDEV_UP
batman-adv: Avoid free/alloc race when handling OGM2 buffer
batman-adv: Avoid free/alloc race when handling OGM buffer
batman-adv: Don't schedule OGM for disabled interface
perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
ACPI: watchdog: Allow disabling WDAT at boot
HID: apple: Add support for recent firmware on Magic Keyboards
HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override
cfg80211: check reg_rule for NULL in handle_channel_custom()
scsi: libfc: free response frame from GPN_ID
net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch
net: ks8851-ml: Fix IRQ handling and locking
mac80211: rx: avoid RCU list traversal under mutex
signal: avoid double atomic counter increments for user accounting
slip: not call free_netdev before rtnl_unlock in slip_open
hinic: fix a bug of setting hw_ioctxt
net: rmnet: fix NULL pointer dereference in rmnet_newlink()
jbd2: fix data races at struct journal_head
ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional()
ARM: 8958/1: rename missed uaccess .fixup section
mm: slub: add missing TID bump in kmem_cache_alloc_bulk()
ipv4: ensure rcu_read_lock() in cipso_v4_error()
Linux 4.14.174
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8594f155b5b3df71510fdb5dc034c80fb2332c91
commit aa202f1f56960c60e7befaa0f49c72b8fa11b0a8 upstream.
wq_select_unbound_cpu() is designed for unbound workqueues only, but
it's wrongly called when using a bound workqueue too.
Fixing this ensures work queued to a bound workqueue with
cpu=WORK_CPU_UNBOUND always runs on the local CPU.
Before, that would happen only if wq_unbound_cpumask happened to include
it (likely almost always the case), or was empty, or we got lucky with
forced round-robin placement. So restricting
/sys/devices/virtual/workqueue/cpumask to a small subset of a machine's
CPUs would cause some bound work items to run unexpectedly there.
Fixes: ef55718044 ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Hillf Danton <hdanton@sina.com>
[dj: massage changelog]
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With non-canonical CFI, LLVM generates jump table entries for external
symbols in modules and as a result, a function pointer passed from a
module to the core kernel will have a different address.
Disable the warning for now.
Bug: 145210207
Change-Id: Ifdcee3479280f7b97abdee6b4c746f447e0944e6
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Alistair Delva <adelva@google.com>
Changes in 4.14.159
rsi: release skb if rsi_prepare_beacon fails
arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
usb: gadget: u_serial: add missing port entry locking
tty: serial: fsl_lpuart: use the sg count from dma_map_sg
tty: serial: msm_serial: Fix flow control
serial: pl011: Fix DMA ->flush_buffer()
serial: serial_core: Perform NULL checks for break_ctl ops
serial: ifx6x60: add missed pm_runtime_disable
autofs: fix a leak in autofs_expire_indirect()
RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
iwlwifi: pcie: don't consider IV len in A-MSDU
exportfs_decode_fh(): negative pinned may become positive without the parent locked
audit_get_nd(): don't unlock parent too early
NFC: nxp-nci: Fix NULL pointer dereference after I2C communication error
xfrm: release device reference for invalid state
Input: cyttsp4_core - fix use after free bug
sched/core: Avoid spurious lock dependencies
ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
rsxx: add missed destroy_workqueue calls in remove
net: ep93xx_eth: fix mismatch of request_mem_region in remove
i2c: core: fix use after free in of_i2c_notify
serial: core: Allow processing sysrq at port unlock time
cxgb4vf: fix memleak in mac_hlist initialization
iwlwifi: mvm: synchronize TID queue removal
iwlwifi: mvm: Send non offchannel traffic via AP sta
ARM: 8813/1: Make aligned 2-byte getuser()/putuser() atomic on ARMv6+
net/mlx5: Release resource on error flow
clk: sunxi-ng: a64: Fix gate bit of DSI DPHY
dlm: fix possible call to kfree() for non-initialized pointer
extcon: max8997: Fix lack of path setting in USB device mode
net: ethernet: ti: cpts: correct debug for expired txq skb
rtc: s3c-rtc: Avoid using broken ALMYEAR register
i40e: don't restart nway if autoneg not supported
clk: rockchip: fix rk3188 sclk_smc gate data
clk: rockchip: fix rk3188 sclk_mac_lbtest parameter ordering
ARM: dts: rockchip: Fix rk3288-rock2 vcc_flash name
dlm: fix missing idr_destroy for recover_idr
MIPS: SiByte: Enable ZONE_DMA32 for LittleSur
net: dsa: mv88e6xxx: Work around mv886e6161 SERDES missing MII_PHYSID2
scsi: zfcp: drop default switch case which might paper over missing case
crypto: ecc - check for invalid values in the key verification test
crypto: bcm - fix normal/non key hash algorithm failure
pinctrl: qcom: ssbi-gpio: fix gpio-hog related boot issues
Staging: iio: adt7316: Fix i2c data reading, set the data field
mm/vmstat.c: fix NUMA statistics updates
clk: rockchip: fix I2S1 clock gate register for rk3328
clk: rockchip: fix ID of 8ch clock of I2S1 for rk3328
regulator: Fix return value of _set_load() stub
net-next/hinic:fix a bug in set mac address
iomap: sub-block dio needs to zeroout beyond EOF
MIPS: OCTEON: octeon-platform: fix typing
net/smc: use after free fix in smc_wr_tx_put_slot()
math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning
rtc: max8997: Fix the returned value in case of error in 'max8997_rtc_read_alarm()'
rtc: dt-binding: abx80x: fix resistance scale
ARM: dts: exynos: Use Samsung SoC specific compatible for DWC2 module
media: pulse8-cec: return 0 when invalidating the logical address
media: cec: report Vendor ID after initialization
dmaengine: coh901318: Fix a double-lock bug
dmaengine: coh901318: Remove unused variable
dmaengine: dw-dmac: implement dma protection control setting
usb: dwc3: debugfs: Properly print/set link state for HS
usb: dwc3: don't log probe deferrals; but do log other error codes
ACPI: fix acpi_find_child_device() invocation in acpi_preset_companion()
f2fs: fix count of seg_freed to make sec_freed correct
f2fs: change segment to section in f2fs_ioc_gc_range
ARM: dts: rockchip: Fix the PMU interrupt number for rv1108
ARM: dts: rockchip: Assign the proper GPIO clocks for rv1108
f2fs: fix to allow node segment for GC by ioctl path
sparc: Correct ctx->saw_frame_pointer logic.
dma-mapping: fix return type of dma_set_max_seg_size()
altera-stapl: check for a null key before strcasecmp'ing it
serial: imx: fix error handling in console_setup
i2c: imx: don't print error message on probe defer
lockd: fix decoding of TEST results
ASoC: rsnd: tidyup registering method for rsnd_kctrl_new()
ARM: dts: sun5i: a10s: Fix HDMI output DTC warning
ARM: dts: sun8i: v3s: Change pinctrl nodes to avoid warning
dlm: NULL check before kmem_cache_destroy is not needed
ARM: debug: enable UART1 for socfpga Cyclone5
nfsd: fix a warning in __cld_pipe_upcall()
ASoC: au8540: use 64-bit arithmetic instead of 32-bit
ARM: OMAP1/2: fix SoC name printing
arm64: dts: meson-gxl-libretech-cc: fix GPIO lines names
arm64: dts: meson-gxbb-nanopi-k2: fix GPIO lines names
arm64: dts: meson-gxbb-odroidc2: fix GPIO lines names
arm64: dts: meson-gxl-khadas-vim: fix GPIO lines names
net/x25: fix called/calling length calculation in x25_parse_address_block
net/x25: fix null_x25_address handling
ARM: dts: mmp2: fix the gpio interrupt cell number
ARM: dts: realview-pbx: Fix duplicate regulator nodes
tcp: fix off-by-one bug on aborting window-probing socket
tcp: fix SNMP under-estimation on failed retransmission
tcp: fix SNMP TCP timeout under-estimation
modpost: skip ELF local symbols during section mismatch check
kbuild: fix single target build for external module
mtd: fix mtd_oobavail() incoherent returned value
ARM: dts: pxa: clean up USB controller nodes
clk: sunxi-ng: h3/h5: Fix CSI_MCLK parent
ARM: dts: realview: Fix some more duplicate regulator nodes
dlm: fix invalid cluster name warning
net/mlx4_core: Fix return codes of unsupported operations
pstore/ram: Avoid NULL deref in ftrace merging failure path
powerpc/math-emu: Update macros from GCC
clk: renesas: r8a77995: Correct parent clock of DU
MIPS: OCTEON: cvmx_pko_mem_debug8: use oldest forward compatible definition
nfsd: Return EPERM, not EACCES, in some SETATTR cases
tty: Don't block on IO when ldisc change is pending
media: stkwebcam: Bugfix for wrong return values
firmware: qcom: scm: fix compilation error when disabled
mlxsw: spectrum_router: Relax GRE decap matching check
IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling state
IB/hfi1: Close VNIC sdma_progress sleep window
mlx4: Use snprintf instead of complicated strcpy
usb: mtu3: fix dbginfo in qmu_tx_zlp_error_handler
ARM: dts: sunxi: Fix PMU compatible strings
media: vimc: fix start stream when link is disabled
net: aquantia: fix RSS table and key sizes
tcp: exit if nothing to retransmit on RTO timeout
sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision
fuse: verify nlink
fuse: verify attributes
ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236
ALSA: pcm: oss: Avoid potential buffer overflows
ALSA: hda - Add mute led support for HP ProBook 645 G4
Input: synaptics - switch another X1 Carbon 6 to RMI/SMbus
Input: synaptics-rmi4 - re-enable IRQs in f34v7_do_reflash
Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers
Input: goodix - add upside-down quirk for Teclast X89 tablet
coresight: etm4x: Fix input validation for sysfs.
Input: Fix memory leak in psxpad_spi_probe
x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect
CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
CIFS: Fix SMB2 oplock break processing
tty: vt: keyboard: reject invalid keycodes
can: slcan: Fix use-after-free Read in slcan_open
kernfs: fix ino wrap-around detection
jbd2: Fix possible overflow in jbd2_log_space_left()
drm/i810: Prevent underflow in ioctl
KVM: arm/arm64: vgic: Don't rely on the wrong pending table
KVM: x86: do not modify masked bits of shared MSRs
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
crypto: crypto4xx - fix double-free in crypto4xx_destroy_sdr
crypto: af_alg - cast ki_complete ternary op to int
crypto: ccp - fix uninitialized list head
crypto: ecdh - fix big endian bug in ECC library
crypto: user - fix memory leak in crypto_report
spi: atmel: Fix CS high support
RDMA/qib: Validate ->show()/store() callbacks before calling them
iomap: Fix pipe page leakage during splicing
thermal: Fix deadlock in thermal thermal_zone_device_check
binder: Handle start==NULL in binder_update_page_range()
ASoC: rsnd: fixup MIX kctrl registration
KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)
appletalk: Fix potential NULL pointer dereference in unregister_snap_client
appletalk: Set error code if register_snap_client failed
usb: gadget: configfs: Fix missing spin_lock_init()
usb: gadget: pch_udc: fix use after free
scsi: qla2xxx: Fix driver unload hang
media: venus: remove invalid compat_ioctl32 handler
USB: uas: honor flag to avoid CAPACITY16
USB: uas: heed CAPACITY_HEURISTICS
USB: documentation: flags on usb-storage versus UAS
usb: Allow USB device to be warm reset in suspended state
staging: rtl8188eu: fix interface sanity check
staging: rtl8712: fix interface sanity check
staging: gigaset: fix general protection fault on probe
staging: gigaset: fix illegal free on probe errors
staging: gigaset: add endpoint-type sanity check
usb: xhci: only set D3hot for pci device
xhci: Increase STS_HALT timeout in xhci_suspend()
xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
ARM: dts: pandora-common: define wl1251 as child node of mmc3
iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
USB: atm: ueagle-atm: add missing endpoint check
USB: idmouse: fix interface sanity checks
USB: serial: io_edgeport: fix epic endpoint lookup
USB: adutux: fix interface sanity check
usb: core: urb: fix URB structure initialization function
usb: mon: Fix a deadlock in usbmon between mmap and read
tpm: add check after commands attribs tab allocation
mtd: spear_smi: Fix Write Burst mode
virtio-balloon: fix managed page counts when migrating pages between zones
usb: dwc3: ep0: Clear started flag on completion
btrfs: check page->mapping when loading free space cache
btrfs: use refcount_inc_not_zero in kill_all_nodes
Btrfs: fix negative subv_writers counter and data space leak after buffered write
btrfs: Remove btrfs_bio::flags member
Btrfs: send, skip backreference walking for extents with many references
btrfs: record all roots for rename exchange on a subvol
rtlwifi: rtl8192de: Fix missing code to retrieve RX buffer address
rtlwifi: rtl8192de: Fix missing callback that tests for hw release of buffer
rtlwifi: rtl8192de: Fix missing enable interrupt flag
lib: raid6: fix awk build warnings
ovl: relax WARN_ON() on rename to self
ALSA: hda - Fix pending unsol events at shutdown
md/raid0: Fix an error message in raid0_make_request()
watchdog: aspeed: Fix clock behaviour for ast2600
hwrng: omap - Fix RNG wait loop timeout
dm zoned: reduce overhead of backing device checks
workqueue: Fix spurious sanity check failures in destroy_workqueue()
workqueue: Fix pwq ref leak in rescuer_thread()
ASoC: Jack: Fix NULL pointer dereference in snd_soc_jack_report
blk-mq: avoid sysfs buffer overflow with too many CPU cores
cgroup: pids: use atomic64_t for pids->limit
ar5523: check NULL before memcpy() in ar5523_cmd()
s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported
media: bdisp: fix memleak on release
media: radio: wl1273: fix interrupt masking on release
media: cec.h: CEC_OP_REC_FLAG_ values were swapped
cpuidle: Do not unset the driver if it is there already
intel_th: Fix a double put_device() in error path
intel_th: pci: Add Ice Lake CPU support
intel_th: pci: Add Tiger Lake CPU support
PM / devfreq: Lock devfreq in trans_stat_show
cpufreq: powernv: fix stack bloat and hard limit on number of CPUs
ACPI: OSL: only free map once in osl.c
ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data()
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
pinctrl: samsung: Add of_node_put() before return in error path
pinctrl: samsung: Fix device node refcount leaks in S3C24xx wakeup controller init
pinctrl: samsung: Fix device node refcount leaks in init code
pinctrl: samsung: Fix device node refcount leaks in S3C64xx wakeup controller init
mmc: host: omap_hsmmc: add code for special init of wl1251 to get rid of pandora_wl1251_init_card
ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
ppdev: fix PPGETTIME/PPSETTIME ioctls
powerpc: Allow 64bit VDSO __kernel_sync_dicache to work across ranges >4GB
powerpc/xive: Prevent page fault issues in the machine crash handler
powerpc: Allow flush_icache_range to work across ranges >4GB
powerpc/xive: Skip ioremap() of ESB pages for LSI interrupts
video/hdmi: Fix AVI bar unpack
quota: Check that quota is not dirty before release
ext2: check err when partial != NULL
quota: fix livelock in dquot_writeback_dquots
ext4: Fix credit estimate for final inode freeing
reiserfs: fix extended attributes on the root directory
block: fix single range discard merge
scsi: zfcp: trace channel log even for FCP command responses
scsi: qla2xxx: Fix DMA unmap leak
scsi: qla2xxx: Fix session lookup in qlt_abort_work()
scsi: qla2xxx: Fix qla24xx_process_bidir_cmd()
scsi: qla2xxx: Always check the qla2x00_wait_for_hba_online() return value
scsi: qla2xxx: Fix message indicating vectors used by driver
xhci: Fix memory leak in xhci_add_in_port()
xhci: make sure interrupts are restored to correct state
iio: adis16480: Add debugfs_reg_access entry
phy: renesas: rcar-gen3-usb2: Fix sysfs interface of "role"
omap: pdata-quirks: remove openpandora quirks for mmc3 and wl1251
scsi: lpfc: Cap NPIV vports to 256
scsi: lpfc: Correct code setting non existent bits in sli4 ABORT WQE
drbd: Change drbd_request_detach_interruptible's return type to int
e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait
x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
power: supply: cpcap-battery: Fix signed counter sample register
mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead
media: vimc: fix component match compare
ath10k: fix fw crash by moving chip reset after napi disabled
powerpc: Avoid clang warnings around setjmp and longjmp
powerpc: Fix vDSO clock_getres()
ext4: work around deleting a file with i_nlink == 0 safely
firmware: qcom: scm: Ensure 'a0' status code is treated as signed
mm/shmem.c: cast the type of unmap_start to u64
ext4: fix a bug in ext4_wait_for_tail_page_commit
mfd: rk808: Fix RK818 ID template
blk-mq: make sure that line break can be printed
workqueue: Fix missing kfree(rescuer) in destroy_workqueue()
sunrpc: fix crash when cache_head become valid before update
net/mlx5e: Fix SFF 8472 eeprom length
gfs2: fix glock reference problem in gfs2_trans_remove_revoke
kernel/module.c: wakeup processes in module_wq on module unload
gpiolib: acpi: Add Terra Pad 1061 to the run_edge_events_on_boot_blacklist
raid5: need to set STRIPE_HANDLE for batch head
of: unittest: fix memory leak in attach_node_and_children
Linux 4.14.159
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit e66b39af00f426b3356b96433d620cb3367ba1ff upstream.
008847f66c ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration. Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list. This doesn't corrupt the list but creates an
additional reference to the pwq. It got queued twice but will only be
removed once.
This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit def98c84b6cdf2eeea19ec5736e90e316df5206b upstream.
Before actually destrying a workqueue, destroy_workqueue() checks
whether it's actually idle. If it isn't, it prints out a bunch of
warning messages and leaves the workqueue dangling. It unfortunately
has a couple issues.
* Mayday list queueing increments pwq's refcnts which gets detected as
busy and fails the sanity checks. However, because mayday list
queueing is asynchronous, this condition can happen without any
actual work items left in the workqueue.
* Sanity check failure leaves the sysfs interface behind too which can
lead to init failure of newer instances of the workqueue.
This patch fixes the above two by
* If a workqueue has a rescuer, disable and kill the rescuer before
sanity checks. Disabling and killing is guaranteed to flush the
existing mayday list.
* Remove sysfs interface before sanity checks.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Marcin Pawlowski <mpawlowski@fb.com>
Reported-by: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating. However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.
Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again. This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)
Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep. To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.
What if the worker is also executing other items before or after?
Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself. If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.
If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity. But that
should not be a problem. The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation. If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.
Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 1b69ac6b40ebd85eed73e4dbccde2a36961ab990)
Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I2877fec3d381b1006b8bd1261895fdfd68bd21db
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
commit cb9d7fd51d9fbb329d182423bd7b92d0f8cb0e01 upstream.
Some architectures need to use stop_machine() to patch functions for
ftrace, and the assumption is that the stopped CPUs do not make function
calls to traceable functions when they are in the stopped state.
Commit ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after
MULTI_STOP_PREPARE") added calls to the watchdog touch functions from
the stopped CPUs and those functions lack notrace annotations. This
leads to crashes when enabling/disabling ftrace on ARM kernels built
with the Thumb-2 instruction set.
Fix it by adding the necessary notrace annotations.
Fixes: ce4f06dcbb ("stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: oleg@redhat.com
Cc: tj@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180821152507.18313-1-vincent.whitchurch@axis.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 537f4146c53c95aac977852b371bafb9c6755ee1 ]
Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized in this function instead.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 62635ea8c18f0f62df4cc58379e4f1d33afd5801 upstream.
show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:
[ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
[ 1270.473240] -----------------------------------------------------
[ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
[ 1270.474994]
[ 1270.474994] and this task is already holding:
[ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
[ 1270.476046] which would create a new lock dependency:
[ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
[ 1270.476949]
[ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 1270.477553] (&pool->lock/1){-.-.}
...
[ 1270.488900] to a HARDIRQ-irq-unsafe lock:
[ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.}
...
[ 1270.494735] Possible interrupt unsafe locking scenario:
[ 1270.494735]
[ 1270.495250] CPU0 CPU1
[ 1270.495600] ---- ----
[ 1270.495947] lock(&(&lock->wait_lock)->rlock);
[ 1270.496295] local_irq_disable();
[ 1270.496753] lock(&pool->lock/1);
[ 1270.497205] lock(&(&lock->wait_lock)->rlock);
[ 1270.497744] <Interrupt>
[ 1270.497948] lock(&pool->lock/1);
, which will cause a irq inversion deadlock if the above lock scenario
happens.
The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.
Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f ("locking/mutex: Fix
lockdep_assert_held() fail").
Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag. The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.
This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.
v2: Drop unnecessary flag clearing before pool destruction as
suggested by Boqun.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org
Pull workqueue updates from Tejun Heo:
"Nothing major. I introduced a flag collsion bug during v4.13 cycle
which is fixed in this pull request. Fortunately, the flag is for
debugging / verification and the bug isn't critical"
* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Fix flag collision
workqueue: Use TASK_IDLE
workqueue: fix path to documentation
workqueue: doc change for ST behavior on NUMA systems
The new completion/crossrelease annotations interact unfavourable with
the extant flush_work()/flush_workqueue() annotations.
The problem is that when a single work class does:
wait_for_completion(&C)
and
complete(&C)
in different executions, we'll build dependencies like:
lock_map_acquire(W)
complete_acquire(C)
and
lock_map_acquire(W)
complete_release(C)
which results in the dependency chain: W->C->W, which lockdep thinks
spells deadlock, even though there is no deadlock potential since
works are ran concurrently.
One possibility would be to change the work 'lock' to recursive-read,
but that would mean hitting a lockdep limitation on recursive locks.
Also, unconditinoally switching to recursive-read here would fail to
detect the actual deadlock on single-threaded workqueues, which do
have a problem with this.
For now, forcefully disregard these locks for crossrelease.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The flush_work() annotation as introduced by commit:
e159489baa ("workqueue: relax lockdep annotation on flush_work()")
hits on the lockdep problem with recursive read locks.
The situation as described is:
Work W1: Work W2: Task:
ARR(Q) ARR(Q) flush_workqueue(Q)
A(W1) A(W2) A(Q)
flush_work(W2) R(Q)
A(W2)
R(W2)
if (special)
A(Q)
else
ARR(Q)
R(Q)
where: A - acquire, ARR - acquire-read-recursive, R - release.
Where under 'special' conditions we want to trigger a lock recursion
deadlock, but otherwise allow the flush_work(). The allowing is done
by using recursive read locks (ARR), but lockdep is broken for
recursive stuff.
However, there appears to be no need to acquire the lock if we're not
'special', so if we remove the 'else' clause things become much
simpler and no longer need the recursion thing at all.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Workqueues don't use signals, it (ab)uses TASK_INTERRUPTIBLE to avoid
increasing the loadavg numbers. We've 'recently' introduced TASK_IDLE
for this case:
80ed87c8a9 ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
use it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
With the new lockdep crossrelease feature, which checks completions usage,
a false positive is reported in the workqueue code:
> Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> Task C : acquired of lock#3 -> wait for completion of barr->done
> (Task C is in lru_add_drain_all_cpuslocked())
> Worker D : wait for wfc.work to be released -> will complete barr->done
Such a dead lock can not happen because Task C's barr->done and Worker D's
barr->done can not be the same instance.
The reason of this false positive is we initialize all wq_barrier::done
at insert_wq_barrier() via init_completion(), which makes them belong to
the same lock class, therefore, impossible circles are reported.
To fix this, explicitly initialize the lockdep map for wq_barrier::done
in insert_wq_barrier(), so that the lock class key of wq_barrier::done
is a subkey of the corresponding work_struct, as a result we won't build
a dependency between a wq_barrier with a unrelated work, and we can
differ wq barriers based on the related works, so the false positive
above is avoided.
Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE
to make the code simple and away from unnecessary #ifdefs.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static. This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory. PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.
Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules. In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.
This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools. It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code. It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.
[With additions to the patch provided by Tejun Hao <tj@kernel.org>]
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1. Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.
This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution. After NUMA affinity 4c16bd327c ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.
While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.
It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's
automatically set __WQ_ORDERED for those workqueues.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: Alexei Potashnik <alexei@purestorage.com>
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
Cc: stable@vger.kernel.org # v3.10+
Rename:
wait_queue_t => wait_queue_entry_t
'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.
Start sorting this out by renaming it to 'wait_queue_entry_t'.
This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull workqueue update from Tejun Heo:
"One trivial patch to use setup_deferrable_timer() instead of
open-coding the initialization"
* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: use setup_deferrable_timer
Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender. This actually happened with smc.
__queue_delayed_work() already does several input sanity checks
synchronously. Add NULL @wq check.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
While splitting up workqueue initialization into two parts,
ac8f73400782 ("workqueue: make workqueue available early during boot")
put wq_numa_init() into workqueue_init_early(). Unfortunately, on
some archs including power and arm64, cpu to node mapping isn't yet
established by the time the early init is called leading to incorrect
NUMA initialization and subsequently the following oops due to zero
cpumask on node-specific unbound pools.
Unable to handle kernel paging request for data at address 0x00000038
Faulting instruction address: 0xc0000000000fc0cc
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
task: c0000007f5400000 task.stack: c000001ffc084000
NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
REGS: c000001ffc087780 TRAP: 0300 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161005)
MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 48000424 XER: 00000000
CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
LR [c0000000000ed928] activate_task+0x78/0xe0
Call Trace:
[c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
[c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
[c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
[c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
[c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
[c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
[c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
[c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
[c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
Instruction dump:
62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58
---[ end trace 0000000000000000 ]---
Fix it by moving wq_numa_init() to workqueue_init(). As this means
that the early intialization may not have full NUMA info for per-cpu
pools and ignores NUMA affinity for unbound pools, fix them up from
workqueue_init() after wq_numa_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au
Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot")
Signed-off-by: Tejun Heo <tj@kernel.org>
Workqueue is currently initialized in an early init call; however,
there are cases where early boot code has to be split and reordered to
come after workqueue initialization or the same code path which makes
use of workqueues is used both before workqueue initailization and
after. The latter cases have to gate workqueue usages with
keventd_up() tests, which is nasty and easy to get wrong.
Workqueue usages have become widespread and it'd be a lot more
convenient if it can be used very early from boot. This patch splits
workqueue initialization into two steps. workqueue_init_early() which
sets up the basic data structures so that workqueues can be created
and work items queued, and workqueue_init() which actually brings up
workqueues online and starts executing queued work items. The former
step can be done very early during boot once memory allocation,
cpumasks and idr are initialized. The latter right after kthreads
become available.
This allows work item queueing and canceling from very early boot
which is what most of these use cases want.
* As systemd_wq being initialized doesn't indicate that workqueue is
fully online anymore, update keventd_up() to test wq_online instead.
The follow-up patches will get rid of all its usages and the
function itself.
* Flushing doesn't make sense before workqueue is fully initialized.
The flush functions trigger WARN and return immediately before fully
online.
* Work items are never in-flight before fully online. Canceling can
always succeed by skipping the flush step.
* Some code paths can no longer assume to be called with irq enabled
as irq is disabled during early boot. Use irqsave/restore
operations instead.
v2: Watchdog init, which requires timer to be running, moved from
workqueue_init_early() to workqueue_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com
Pull smp hotplug updates from Thomas Gleixner:
"This is the next part of the hotplug rework.
- Convert all notifiers with a priority assigned
- Convert all CPU_STARTING/DYING notifiers
The final removal of the STARTING/DYING infrastructure will happen
when the merge window closes.
Another 700 hundred line of unpenetrable maze gone :)"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
timers/core: Correct callback order during CPU hot plug
leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
powerpc/numa: Convert to hotplug state machine
arm/perf: Fix hotplug state machine conversion
irqchip/armada: Avoid unused function warnings
ARC/time: Convert to hotplug state machine
clocksource/atlas7: Convert to hotplug state machine
clocksource/armada-370-xp: Convert to hotplug state machine
clocksource/exynos_mct: Convert to hotplug state machine
clocksource/arm_global_timer: Convert to hotplug state machine
rcu: Convert rcutree to hotplug state machine
KVM/arm/arm64/vgic-new: Convert to hotplug state machine
smp/cfd: Convert core to hotplug state machine
x86/x2apic: Convert to CPU hotplug state machine
profile: Convert to hotplug state machine
timers/core: Convert to hotplug state machine
hrtimer: Convert to hotplug state machine
x86/tboot: Convert to hotplug state machine
arm64/armv8 deprecated: Convert to hotplug state machine
hwtracing/coresight-etm4x: Convert to hotplug state machine
...
* pm-sleep:
PM / hibernate: Introduce test_resume mode for hibernation
x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
PM / hibernate: Image data protection during restoration
PM / hibernate: Add missing braces in __register_nosave_region()
PM / hibernate: Clean up comments in snapshot.c
PM / hibernate: Clean up function headers in snapshot.c
PM / hibernate: Add missing braces in hibernate_setup()
PM / hibernate: Recycle safe pages after image restoration
PM / hibernate: Simplify mark_unsafe_pages()
PM / hibernate: Do not free preallocated safe pages during image restore
PM / suspend: show workqueue state in suspend flow
PM / sleep: make PM notifiers called symmetrically
PM / sleep: Make pm_prepare_console() return void
PM / Hibernate: Don't let kasan instrument snapshot.c
* pm-tools:
PM / tools: scripts: AnalyzeSuspend v4.2
tools/turbostat: allow user to alter DESTDIR and PREFIX
With commit e9d867a67f ("sched: Allow per-cpu kernel threads to
run on online && !active"), __set_cpus_allowed_ptr() expects that only
strict per-cpu kernel threads can have affinity to an online CPU which
is not yet active.
This assumption is currently broken in the CPU_ONLINE notification
handler for the workqueues where restore_unbound_workers_cpumask()
calls set_cpus_allowed_ptr() when the first cpu in the unbound
worker's pool->attr->cpumask comes online. Since
set_cpus_allowed_ptr() is called with pool->attr->cpumask in which
only one CPU is online which is not yet active, we get the following
WARN_ON during an CPU online operation.
------------[ cut here ]------------
WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166
__set_cpus_allowed_ptr+0x228/0x2e0
Modules linked in:
CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4
<..snip..>
Call Trace:
[c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable)
[c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470
[c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100
[c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0
[c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50
[c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250
[c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120
[c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0
[c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0
[c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130
[c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c
---[ end trace 00f1456578b2a3b2 ]---
This patch fixes this by limiting the mask to the intersection of
the pool affinity and online CPUs.
Changelog-cribbed-from: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
When activating a static object we need make sure that the object is
tracked in the object tracker. If it is a non-static object then the
activation is illegal.
In previous implementation, each subsystem need take care of this in
their fixup callbacks. Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.
To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not. If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.
This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.
At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked. Then Change
such object can overwrite someone's memory and cause unexpected
behaviour. For example, the timer_fixup_activate bind timer to function
stub_timer.
Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull workqueue fix from Tejun Heo:
"CPU hotplug callbacks can invoke DOWN_FAILED w/o preceding
DOWN_PREPARE which can trigger a WARN_ON() in workqueue.
The bug has been there for a very long time. It only triggers if CPU
down fails at a specific point and I don't think it has adverse
effects other than the warning messages. The fix is very low impact"
* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix rebind bound workers warning