98 Commits

Author SHA1 Message Date
Lucas Wei
65e475187b Merge android-4.9-q (4.9.279) into android-msm-pixel-4.9-sc-lts
Merge 4.9.279 into android-4.9-q
Linux 4.9.279
    spi: mediatek: Fix fifo transfer
    can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
  * Revert "Bluetooth: Shutdown controller after workqueues are flushed or cancelled"
      net/bluetooth/hci_core.c
  * net: Fix zero-copy head len calculation.
      net/core/skbuff.c
  * r8152: Fix potential PM refcount imbalance
      drivers/net/usb/r8152.c
    regulator: rt5033: Fix n_voltages settings for BUCK and LDO
    btrfs: mark compressed range uptodate only if all bio succeed
    Merge 4.9.278 into android-4.9-q
Linux 4.9.278
    sis900: Fix missing pci_disable_device() in probe and remove
    tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
    net/mlx5: Fix flow table chaining
  * net: llc: fix skb_over_panic
      include/net/llc_pdu.h
    mlx4: Fix missing error code in mlx4_load_one()
    tipc: fix sleeping in tipc accept routine
    netfilter: nft_nat: allow to specify layer 4 protocol NAT only
  * netfilter: conntrack: adjust stop timestamp to real expiry value
      net/netfilter/nf_conntrack_core.c
  * cfg80211: Fix possible memory leak in function cfg80211_bss_update
      net/wireless/scan.c
    x86/asm: Ensure asm/proto.h can be included stand-alone
    nfc: nfcsim: fix use after free during module unload
    NIU: fix incorrect error return, missed in previous revert
    can: esd_usb2: fix memory leak
    can: ems_usb: fix memory leak
    can: usb_8dev: fix memory leak
    ocfs2: issue zeroout to EOF blocks
    ocfs2: fix zero out valid data
    x86/kvm: fix vcpu-id indexed array sizes
    ARM: ensure the signal page contains defined contents
  * lib/string.c: add multibyte memset functions
      include/linux/string.h
      lib/string.c
    ARM: dts: versatile: Fix up interrupt controller node names
    hfs: add lock nesting notation to hfs_find_init
    hfs: fix high memory mapping in hfs_bnode_read
    hfs: add missing clean-up in hfs_fill_super
  * sctp: move 198 addresses from unusable to private scope
      include/net/sctp/constants.h
    net/802/garp: fix memleak in garp_request_join()
    net/802/mrp: fix memleak in mrp_request_join()
  * workqueue: fix UAF in pwq_unbound_release_workfn()
      kernel/workqueue.c
  * af_unix: fix garbage collect vs MSG_PEEK
      net/unix/af_unix.c
  * net: split out functions related to registering inflight socket files
      include/net/af_unix.h
      net/Makefile
      net/unix/Kconfig
      net/unix/Makefile
      net/unix/af_unix.c
      net/unix/garbage.c
      net/unix/scm.c
      net/unix/scm.h
    tipc: Fix backport of b77413446408fdd256599daf00d5be72b5f3e7c6
    iommu/amd: Fix backport of 140456f994195b568ecd7fc2287a34eadffef3ca
    Merge 4.9.277 into android-4.9-q
Linux 4.9.277
    btrfs: compression: don't try to compress if we don't have enough pages
    iio: accel: bma180: Fix BMA25x bandwidth register values
    iio: accel: bma180: Use explicit member assignment
    net: bcmgenet: ensure EXT_ENERGY_DET_MASK is clear
    media: ngene: Fix out-of-bounds bug in ngene_command_config_free_buf()
  * tracing: Fix bug in rb_per_cpu_empty() that might cause deadloop.
      kernel/trace/ring_buffer.c
    USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
    USB: serial: cp210x: fix comments for GE CS1000
    USB: serial: option: add support for u-blox LARA-R6 family
    usb: renesas_usbhs: Fix superfluous irqs happen after usb_pkt_pop()
    usb: max-3421: Prevent corruption of freed memory
    USB: usb-storage: Add LaCie Rugged USB3-FW to IGNORE_UAS
  * usb: hub: Disable USB 3 device initiated lpm if exit latency is too high
      drivers/usb/core/hub.c
    KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
  * xhci: Fix lost USB 2 remote wake
      drivers/usb/host/xhci-hub.c
    ALSA: sb: Fix potential ABBA deadlock in CSP driver
    s390/ftrace: fix ftrace_update_ftrace_func implementation
    Revert "MIPS: add PMD table accounting into MIPS'pmd_alloc_one"
  * proc: Avoid mixing integer types in mem_rw()
      fs/proc/base.c
  * Revert "USB: quirks: ignore remote wake-up on Fibocom L850-GL LTE modem"
      drivers/usb/core/quirks.c
    scsi: target: Fix protect handling in WRITE SAME(32)
    scsi: iscsi: Fix iface sysfs attr detection
    netrom: Decrease sock refcount when sock timers expire
    net: decnet: Fix sleeping inside in af_decnet
    net: fix uninit-value in caif_seqpkt_sendmsg
    s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]
    spi: mediatek: fix fifo rx mode
    perf probe-file: Delete namelist in del_events() on the error path
    perf test bpf: Free obj_buf
    perf lzma: Close lzma stream on exit
    igb: Check if num of q_vectors is smaller than max before array access
    iavf: Fix an error handling path in 'iavf_probe()'
    e1000e: Fix an error handling path in 'e1000_probe()'
    fm10k: Fix an error handling path in 'fm10k_probe()'
    igb: Fix an error handling path in 'igb_probe()'
    ixgbe: Fix an error handling path in 'ixgbe_probe()'
  * ipv6: tcp: drop silly ICMPv6 packet too big messages
      net/ipv4/tcp_output.c
      net/ipv6/tcp_ipv6.c
  * tcp: annotate data races around tp->mtu_info
      net/ipv4/tcp_ipv4.c
      net/ipv6/tcp_ipv6.c
  * net: validate lwtstate->data before returning from skb_tunnel_info()
      include/net/dst_metadata.h
    net: ti: fix UAF in tlan_remove_one
    net: qcom/emac: fix UAF in emac_remove
    net: moxa: fix UAF in moxart_mac_probe
    net: bcmgenet: Ensure all TX/RX queues DMAs are disabled
  * net: bridge: sync fdb to new unicast-filtering ports
      net/bridge/br_if.c
  * net: ipv6: fix return value of ip6_skb_dst_mtu
      include/net/ip6_route.h
      net/ipv6/xfrm6_output.c
  * sched/fair: Fix CFS bandwidth hrtimer expiry type
      kernel/sched/fair.c
    scsi: aic7xxx: Fix unintentional sign extension issue on left shift of u8
    rtc: max77686: Do not enforce (incorrect) interrupt trigger type
  * kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set
      scripts/mkcompile_h
  * thermal/core: Correct function name thermal_zone_device_unregister()
      drivers/thermal/thermal_core.c
    arm64: dts: juno: Update SCPI nodes as per the YAML schema
    ARM: dts: stm32: fix RCC node name on stm32f429 MCU
    ARM: imx: pm-imx5: Fix references to imx5_cpu_suspend_info
    ARM: dts: imx6: phyFLEX: Fix UART hardware flow control
    ARM: dts: BCM63xx: Fix NAND nodes names
    ARM: brcmstb: dts: fix NAND nodes names
    reset: ti-syscon: fix to_ti_syscon_reset_data macro
    ARM: dts: rockchip: Fix power-controller node names for rk3288
    ARM: dts: rockchip: fix pinctrl sleep nodename for rk3036-kylin and rk3288
  * ANDROID: selinux: modify RTM_GETNEIGH{TBL}
      security/selinux/include/classmap.h
      security/selinux/include/security.h
      security/selinux/nlmsgtab.c
      security/selinux/ss/policydb.c
      security/selinux/ss/policydb.h
      security/selinux/ss/services.c
    Merge 4.9.276 into android-4.9-q
Linux 4.9.276
  * seq_file: disallow extremely large seq buffer allocations
      fs/seq_file.c
    MIPS: vdso: Invalid GIC access through VDSO
    mips: disable branch profiling in boot/decompress.o
    mips: always link byteswap helpers into decompressor
    scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
    ARM: dts: am335x: align ti,pindir-d0-out-d1-in property with dt-shema
    memory: fsl_ifc: fix leak of private memory on probe failure
    memory: fsl_ifc: fix leak of IO mapping on probe failure
  * reset: bail if try_module_get() fails
      drivers/reset/core.c
    ARM: dts: r8a7779, marzen: Fix DU clock names
  * rtc: fix snprintf() checking in is_rtc_hctosys()
      drivers/rtc/rtc-proc.c
    ARM: dts: exynos: fix PWM LED max brightness on Odroid XU4
    ARM: dts: exynos: fix PWM LED max brightness on Odroid XU/XU3
    hexagon: use common DISCARDS macro
    ALSA: isa: Fix error return code in snd_cmi8330_probe()
    x86/fpu: Limit xstate copy size in xstateregs_set()
    ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode
    nfs: fix acl memory leak of posix_acl_create()
    watchdog: aspeed: fix hardware timeout calculation
    um: fix error return code in winch_tramp()
    um: fix error return code in slip_open()
  * power: supply: rt5033_battery: Fix device tree enumeration
      drivers/power/supply/Kconfig
    PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun
    virtio_console: Assure used length from device is limited
    virtio-blk: Fix memory leak among suspend/resume procedure
    ACPI: AMBA: Fix resource name in /proc/iomem
    pwm: tegra: Don't modify HW state in .remove callback
    power: supply: ab8500: add missing MODULE_DEVICE_TABLE
    power: supply: charger-manager: add missing MODULE_DEVICE_TABLE
    ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty
    orangefs: fix orangefs df output.
    x86/fpu: Return proper error codes from user access functions
    watchdog: Fix possible use-after-free by calling del_timer_sync()
    watchdog: sc520_wdt: Fix possible use-after-free in wdt_turnoff()
    watchdog: Fix possible use-after-free in wdt_startup()
    ARM: 9087/1: kprobes: test-thumb: fix for LLVM_IAS=1
    power: reset: gpio-poweroff: add missing MODULE_DEVICE_TABLE
    power: supply: ab8500: Avoid NULL pointers
    pwm: spear: Don't modify HW state in .remove callback
    lib/decompress_unlz4.c: correctly handle zero-padding around initrds.
  * i2c: core: Disable client irq on reboot/shutdown
      drivers/i2c/i2c-core.c
    ALSA: hda: Add IRQ check for platform_get_irq()
    backlight: lm3630a: Fix return code of .update_status() callback
    powerpc/boot: Fixup device-tree on little endian
    usb: gadget: hid: fix error return code in hid_bind()
  * usb: gadget: f_hid: fix endianness issue with descriptors
      drivers/usb/gadget/function/f_hid.c
  * ALSA: bebob: add support for ToneWeal FW66
      sound/firewire/Kconfig
  * ASoC: soc-core: Fix the error return code in snd_soc_of_parse_audio_routing()
      sound/soc/soc-core.c
    selftests/powerpc: Fix "no_handler" EBB selftest
    ALSA: ppc: fix error return code in snd_pmac_probe()
    gpio: zynq: Check return value of pm_runtime_get_sync
    powerpc/ps3: Add dma_mask to ps3_dma_region
    ALSA: sb: Fix potential double-free of CSP mixer elements
    s390/sclp_vt220: fix console name to match device
    mfd: da9052/stmpe: Add and modify MODULE_DEVICE_TABLE
    scsi: iscsi: Add iscsi_cls_conn refcount helpers
    fs/jfs: Fix missing error code in lmLogInit()
    tty: serial: 8250: serial_cs: Fix a memory leak in error handling path
    scsi: lpfc: Fix "Unexpected timeout" error in direct attach topology
  * Revert "ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro"
      sound/firewire/Kconfig
    misc/libmasm/module: Fix two use after free in ibmasm_init_one
    tty: serial: fsl_lpuart: fix the potential risk of division or modulo by zero
  * fscrypt: don't ignore minor_hash when hash is 0
      fs/crypto/fname.c
    tracing: Do not reference char * as a string in histograms
  * scsi: core: Fix bad pointer dereference when ehandler kthread is invalid
      drivers/scsi/hosts.c
    KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()
    KVM: x86: Use guest MAXPHYADDR from CPUID.0x8000_0008 iff TDP is enabled
  * smackfs: restrict bytes count in smk_set_cipso()
      security/smack/smackfs.c
    jfs: fix GPF in diFree
    media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K
    media: gspca/sunplus: fix zero-length control requests
    media: gspca/sq905: fix control-request direction
    media: zr364xx: fix memory leak in zr364xx_start_readpipe
    media: dtv5100: fix control-request directions
    dm btree remove: assign new_root only when removal succeeds
    ipack/carriers/tpci200: Fix a double free in tpci200_pci_probe
  * seq_buf: Fix overflow in seq_buf_putmem_hex()
      lib/seq_buf.c
    power: supply: ab8500: Fix an old bug
    ipmi/watchdog: Stop watchdog timer when the current action is 'none'
    qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
    ASoC: tegra: Set driver_name=tegra for all machine drivers
    ata: ahci_sunxi: Disable DIPM
  * mmc: core: clear flags before allowing to retune
      drivers/mmc/core/core.c
  * mmc: sdhci: Fix warning message when accessing RPMB in HS400 mode
      drivers/mmc/host/sdhci.c
      drivers/mmc/host/sdhci.h
    pinctrl/amd: Add device HID for new AMD GPIO controller
    powerpc/barrier: Avoid collision with clang's __lwsync macro
    mac80211: fix memory corruption in EAPOL handling
    can: bcm: delay release of struct bcm_op after synchronize_rcu()
    can: gw: synchronize rcu operations before removing gw job entry
  * fuse: reject internal errno
      fs/fuse/dev.c
    sctp: add size validation when walking chunks
    Bluetooth: btusb: fix bt fiwmare downloading failure issue for qca btsoc.
  * Bluetooth: Shutdown controller after workqueues are flushed or cancelled
      net/bluetooth/hci_core.c
  * Bluetooth: Fix the HCI to MGMT status conversion table
      net/bluetooth/mgmt.c
    RDMA/cma: Fix rdma_resolve_route() memory leak
  * wireless: wext-spy: Fix out-of-bounds warning
      net/wireless/wext-spy.c
    sfc: error code if SRIOV cannot be disabled
    sfc: avoid double pci_remove of VFs
    RDMA/rxe: Don't overwrite errno from ib_umem_get()
    atm: nicstar: register the interrupt handler in the right place
    atm: nicstar: use 'dma_free_coherent' instead of 'kfree'
    MIPS: add PMD table accounting into MIPS'pmd_alloc_one
    cw1200: add missing MODULE_DEVICE_TABLE
    wl1251: Fix possible buffer overflow in wl1251_cmd_scan
    wlcore/wl12xx: Fix wl12xx get_mac error if device is in ELP
  * xfrm: Fix error reporting in xfrm_state_construct.
      net/xfrm/xfrm_user.c
  * selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
      security/selinux/avc.c
    fjes: check return value after calling platform_get_resource()
    net: micrel: check return value after calling platform_get_resource()
    dm space maps: don't reset space map allocation cursor when committing
    RDMA/cxgb4: Fix missing error code in create_qp()
  * ipv6: use prandom_u32() for ID generation
      net/ipv6/output_core.c
    clk: tegra: Ensure that PLLU configuration is applied properly
    e100: handle eeprom as little endian
    udf: Fix NULL pointer dereference in udf_symlink function
    drm/virtio: Fix double free on probe failure
    reiserfs: add check for invalid 1st journal block
  * net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT
      net/core/dev.c
    atm: nicstar: Fix possible use-after-free in nicstar_cleanup()
    mISDN: fix possible use-after-free in HFC_cleanup()
    atm: iphase: fix possible use-after-free in ia_module_exit()
    hugetlb: clear huge pte during flush function on mips platform
    net: pch_gbe: Use proper accessors to BE data in pch_ptp_match()
  * scsi: core: Retry I/O for Notify (Enable Spinup) Required error
      drivers/scsi/scsi_lib.c
    mmc: vub3000: fix control-request direction
    selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
    mm/huge_memory.c: don't discard hugepage if other processes are mapping it
    leds: ktd2692: Fix an error handling path
  * configfs: fix memleak in configfs_release_bin_file
      fs/configfs/file.c
    extcon: max8997: Add missing modalias string
    extcon: sm5502: Drop invalid register write in sm5502_reg_data
    phy: ti: dm816x: Fix the error handling path in 'dm816x_usb_phy_probe()
    scsi: mpt3sas: Fix error return value in _scsih_expander_add()
  * of: Fix truncation of memory sizes on 32-bit platforms
      drivers/of/fdt.c
      drivers/of/of_reserved_mem.c
    staging: gdm724x: check for overflow in gdm_lte_netif_rx()
    staging: gdm724x: check for buffer overflow in gdm_lte_multi_sdu_pkt()
    s390: appldata depends on PROC_SYSCTL
    scsi: FlashPoint: Rename si_flags field
    tty: nozomi: Fix the error handling path of 'nozomi_card_init()'
    char: pcmcia: error out if 'num_bytes_read' is greater than 4 in set_protocol()
    Input: hil_kbd - fix error return code in hil_dev_connect()
    iio: light: tcs3414: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: light: isl29125: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: prox: pulsed-light: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: humidity: am2315: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: gyro: bmg160: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: adc: vf610: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: adc: ti-ads1015: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: accel: stk8ba50: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: accel: stk8312: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: accel: kxcjk-1013: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: accel: bma220: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: accel: bma180: Fix buffer alignment in iio_push_to_buffers_with_timestamp()
    iio: adis_buffer: do not return ints in irq handlers
    tty: nozomi: Fix a resource leak in an error handling function
    net: sched: fix warning in tcindex_alloc_perfect_hash
  * writeback: fix obtain a reference to a freeing memcg css
      fs/fs-writeback.c
  * Bluetooth: mgmt: Fix slab-out-of-bounds in tlv_data_is_valid
      net/bluetooth/mgmt.c
    i40e: Fix error handling in i40e_vsi_open
    vxlan: add missing rcu_read_lock() in neigh_reduce()
    net: ethernet: ezchip: fix error handling
    net: ethernet: ezchip: fix UAF in nps_enet_remove
    net: ethernet: aeroflex: fix UAF in greth_of_remove
    netfilter: nft_exthdr: check for IPv6 packet before further processing
  * netlabel: Fix memory leak in netlbl_mgmt_add_common
      net/netlabel/netlabel_mgmt.c
    ath10k: Fix an error code in ath10k_add_interface()
    brcmsmac: mac80211_if: Fix a resource leak in an error handling path
  * wireless: carl9170: fix LEDS build errors & warnings
      drivers/net/wireless/ath/carl9170/Kconfig
    drm: qxl: ensure surf.data is ininitialized
    RDMA/rxe: Fix failure during driver load
    ehea: fix error return code in ehea_restart_qps()
    net: pch_gbe: Propagate error from devm_gpio_request_one()
    ocfs2: fix snprintf() checking
    ACPI: sysfs: Fix a buffer overrun problem with description_show()
    crypto: nx - Fix RCU warning in nx842_OF_upd_status
    spi: spi-sun6i: Fix chipselect/clock bug
    hwmon: (max31790) Fix fan speed reporting for fan7..12
    hwmon: (max31722) Remove non-standard ACPI device IDs
    media: s5p-g2d: Fix a memory leak on ctx->fh.m2m_ctx
    mmc: usdhi6rol0: fix error return code in usdhi6_probe()
    media: siano: Fix out-of-bounds warnings in smscore_load_firmware_family2()
    media: tc358743: Fix error return code in tc358743_probe_of()
    pata_ep93xx: fix deferred probing
    pata_octeon_cf: avoid WARN_ON() in ata_host_activate()
    media: I2C: change 'RST' to "RSET" to fix multiple build errors
    pata_rb532_cf: fix deferred probing
    sata_highbank: fix deferred probing
    crypto: ux500 - Fix error return code in hash_hw_final()
    crypto: ixp4xx - dma_unmap the correct address
    media: s5p_cec: decrement usage count if disabled
    ia64: mca_drv: fix incorrect array size calculation
    ACPI: tables: Add custom DSDT file as makefile prerequisite
    platform/x86: toshiba_acpi: Fix missing error code in toshiba_acpi_setup_keyboard()
    ACPI: bus: Call kobject_put() in acpi_init() error path
    fs: dlm: fix memory leak when fenced
  * random32: Fix implicit truncation warning in prandom_seed_state()
      include/linux/prandom.h
    fs: dlm: cancel work sync othercon
  * block_dump: remove block_dump feature in mark_inode_dirty()
      fs/fs-writeback.c
    ACPI: processor idle: Fix up C-state latency if not ordered
    regulator: da9052: Ensure enough delay time for .set_voltage_time_sel
  * btrfs: disable build on platforms having page size 256K
      fs/btrfs/Kconfig
    btrfs: abort transaction if we fail to update the delayed inode
    media: siano: fix device register error path
  * media: dvb_net: avoid speculation from net slot
      drivers/media/dvb-core/dvb_net.c
  * crypto: shash - avoid comparing pointers to exported functions under CFI
      crypto/shash.c
      include/crypto/internal/hash.h
    mmc: via-sdmmc: add a check against NULL pointer dereference
    media: st-hva: Fix potential NULL pointer dereferences
    media: bt8xx: Fix a missing check bug in bt878_probe
  * media: v4l2-core: Avoid the dangling pointer in v4l2_fh_release
      drivers/media/v4l2-core/v4l2-fh.c
    crypto: qat - remove unused macro in FW loader
    crypto: qat - check return code of qat_hal_rd_rel_reg()
    media: pvrusb2: fix warning in pvr2_i2c_core_done
    media: cobalt: fix race condition in setting HPD
    media: cpia2: fix memory leak in cpia2_usb_probe
    crypto: nx - add missing MODULE_DEVICE_TABLE
    spi: omap-100k: Fix the length judgment problem
    spi: spi-topcliff-pch: Fix potential double free in pch_spi_process_messages()
    spi: spi-loopback-test: Fix 'tx_buf' might be 'rx_buf'
  * fuse: check connected before queueing on fpq->io
      fs/fuse/dev.c
  * seq_buf: Make trace_seq_putmem_hex() support data longer than 8
      lib/seq_buf.c
    ssb: sdio: Don't overwrite const buffer if block_write fails
    ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()
    serial_cs: remove wrong GLOBETROTTER.cis entry
    serial_cs: Add Option International GSM-Ready 56K/ISDN modem
    serial: sh-sci: Stop dmaengine transfer in sci_stop_tx()
    iio: ltr501: ltr501_read_ps(): add missing endianness conversion
    iio: ltr501: ltr559: fix initialization of LTR501_ALS_CONTR
    iio: ltr501: mark register holding upper 8 bits of ALS_DATA{0,1} and PS_DATA as volatile, too
    s390/cio: dont call css_wait_for_slow_path() inside a lock
    SUNRPC: Should wake up the privileged task firstly.
    SUNRPC: Fix the batch tasks count wraparound.
  * ext4: fix avefreec in find_group_orlov
      fs/ext4/ialloc.c
  * ext4: remove check for zero nr_to_scan in ext4_es_scan()
      fs/ext4/extents_status.c
  * ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
      fs/ext4/extents_status.c
  * ext4: fix kernel infoleak via ext4_extent_header
      fs/ext4/extents.c
    btrfs: clear defrag status of a root if starting transaction fails
    ARM: dts: at91: sama5d4: fix pinctrl muxing
    Input: joydev - prevent use of not validated data in JSIOCSBTNMAP ioctl
  * iov_iter_fault_in_readable() should do nothing in xarray case
      lib/iov_iter.c
    ntfs: fix validity check for file name attribute
    USB: cdc-acm: blacklist Heimann USB Appset device
    usb: gadget: eem: fix echo command packet response issue
    net: can: ems_usb: fix use-after-free in ems_usb_disconnect()
    Input: usbtouchscreen - fix control-request directions
    media: dvb-usb: fix wrong definition
  * ALSA: usb-audio: fix rate on Ozone Z90 USB headset
      sound/usb/format.c
    Merge 4.9.275 into android-4.9-q
Linux 4.9.275
    xen/events: reset active flag for lateeoi events later
  * kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
      kernel/kthread.c
  * kthread_worker: split code for canceling the delayed work timer
      kernel/kthread.c
    drm/nouveau: fix dma_address check for CPU/GPU sync
    scsi: sr: Return appropriate error code when disk is ejected
  * mm, futex: fix shared futex pgoff on shmem huge page
      include/linux/hugetlb.h
      include/linux/pagemap.h
      kernel/futex.c
    mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
  * mm: add VM_WARN_ON_ONCE_PAGE() macro
      include/linux/mmdebug.h
  * include/linux/mmdebug.h: make VM_WARN* non-rvals
      include/linux/mmdebug.h

Bug: 196282886
Change-Id: I727851b06571f0e9d7751d10a59b1edae838882c
Signed-off-by: Lucas Wei <lucaswei@google.com>
2021-08-18 20:51:10 +08:00
Petr Mladek
5d27e1503b kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
commit 5fa54346caf67b4b1b10b1f390316ae466da4d53 upstream.

The system might hang with the following backtrace:

	schedule+0x80/0x100
	schedule_timeout+0x48/0x138
	wait_for_common+0xa4/0x134
	wait_for_completion+0x1c/0x2c
	kthread_flush_work+0x114/0x1cc
	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
	kthread_cancel_delayed_work_sync+0x18/0x2c
	xxxx_pm_notify+0xb0/0xd8
	blocking_notifier_call_chain_robust+0x80/0x194
	pm_notifier_call_chain_robust+0x28/0x4c
	suspend_prepare+0x40/0x260
	enter_state+0x80/0x3f4
	pm_suspend+0x60/0xdc
	state_store+0x108/0x144
	kobj_attr_store+0x38/0x88
	sysfs_kf_write+0x64/0xc0
	kernfs_fop_write_iter+0x108/0x1d0
	vfs_write+0x2f4/0x368
	ksys_write+0x7c/0xec

It is caused by the following race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync():

CPU0				CPU1

Context: Thread A		Context: Thread B

kthread_mod_delayed_work()
  spin_lock()
  __kthread_cancel_work()
     spin_unlock()
     del_timer_sync()
				kthread_cancel_delayed_work_sync()
				  spin_lock()
				  __kthread_cancel_work()
				    spin_unlock()
				    del_timer_sync()
				    spin_lock()

				  work->canceling++
				  spin_unlock
     spin_lock()
   queue_delayed_work()
     // dwork is put into the worker->delayed_work_list

   spin_unlock()

				  kthread_flush_work()
     // flush_work is put at the tail of the dwork

				    wait_for_completion()

Context: IRQ

  kthread_delayed_work_timer_fn()
    spin_lock()
    list_del_init(&work->node);
    spin_unlock()

BANG: flush_work is not longer linked and will never get proceed.

The problem is that kthread_mod_delayed_work() checks work->canceling
flag before canceling the timer.

A simple solution is to (re)check work->canceling after
__kthread_cancel_work().  But then it is not clear what should be
returned when __kthread_cancel_work() removed the work from the queue
(list) and it can't queue it again with the new @delay.

The return value might be used for reference counting.  The caller has
to know whether a new work has been queued or an existing one was
replaced.

The proper solution is that kthread_mod_delayed_work() will remove the
work from the queue (list) _only_ when work->canceling is not set.  The
flag must be checked after the timer is stopped and the remaining
operations can be done under worker->lock.

Note that kthread_mod_delayed_work() could remove the timer and then
bail out.  It is fine.  The other canceling caller needs to cancel the
timer as well.  The important thing is that the queue (list)
manipulation is done atomically under worker->lock.

Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
Fixes: 9a6b06c8d9 ("kthread: allow to modify delayed kthread work")
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reported-by: Martin Liu <liumartin@google.com>
Cc: <jenhaochen@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-11 12:46:41 +02:00
Petr Mladek
392cfdd660 kthread_worker: split code for canceling the delayed work timer
commit 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 upstream.

Patch series "kthread_worker: Fix race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync()".

This patchset fixes the race between kthread_mod_delayed_work() and
kthread_cancel_delayed_work_sync() including proper return value
handling.

This patch (of 2):

Simple code refactoring as a preparation step for fixing a race between
kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().

It does not modify the existing behavior.

Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: <jenhaochen@google.com>
Cc: Martin Liu <liumartin@google.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-11 12:46:41 +02:00
lucaswei
f09d91fe02 Merge android-4.9-q (4.9.248) into android-msm-pixel-4.9-lts
Merge 4.9.248 into android-4.9-q
Linux 4.9.248
    x86/uprobes: Do not use prefixes.nbytes when looping over prefixes.bytes
    Input: i8042 - fix error return code in i8042_setup_aux()
    i2c: qup: Fix error return code in qup_i2c_bam_schedule_desc()
    gfs2: check for empty rgrp tree in gfs2_ri_update
  * tracing: Fix userstacktrace option for instances
      kernel/trace/trace.c
      kernel/trace/trace.h
    spi: bcm2835: Release the DMA channel if probe fails after dma_init
    spi: bcm2835: Fix use-after-free on unbind
    spi: bcm-qspi: Fix use-after-free on unbind
  * spi: Introduce device-managed SPI controller allocation
      drivers/spi/spi.c
      include/linux/spi/spi.h
    iommu/amd: Set DTE[IntTabLen] to represent 512 IRTEs
    i2c: imx: Check for I2SR_IAL after every byte
    i2c: imx: Fix reset of I2SR_IAL flag
    cifs: fix potential use-after-free in cifs_echo_request()
    ftrace: Fix updating FTRACE_FL_TRAMP
  * tty: Fix ->session locking
      drivers/tty/tty_io.c
      include/linux/tty.h
    ALSA: hda/generic: Add option to enforce preferred_dacs pairs
    ALSA: hda/realtek - Add new codec supported for ALC897
  * tty: Fix ->pgrp locking in tiocspgrp()
      drivers/tty/tty_io.c
    USB: serial: option: add support for Thales Cinterion EXS82
    USB: serial: option: add Fibocom NL668 variants
    USB: serial: ch341: sort device-id entries
    USB: serial: ch341: add new Product ID for CH341A
    USB: serial: kl5kusb105: fix memleak on open
  * usb: gadget: f_fs: Use local copy of descriptors for userspace copy
      drivers/usb/gadget/function/f_fs.c
  * vlan: consolidate VLAN parsing code and limit max parsing depth
      include/linux/if_vlan.h
      include/net/inet_ecn.h
    pinctrl: baytrail: Fix pin being driven low for a while on gpiod_get(..., GPIOD_OUT_HIGH)
    pinctrl: baytrail: Replace WARN with dev_info_once when setting direct-irq pin to output
    btrfs: sysfs: init devices outside of the chunk_mutex
    RDMA/i40iw: Address an mmap handler exploit in i40iw
  * spi: Fix controller unregister order harder
      drivers/spi/spi.c
    Input: i8042 - add ByteSpeed touchpad to noloop table
  * Input: xpad - support Ardwiino Controllers
      drivers/input/joystick/xpad.c
    dt-bindings: net: correct interrupt flags in examples
    net/mlx5: Fix wrong address reclaim when command interface is down
    net: pasemi: fix error return code in pasemi_mac_open()
    cxgb3: fix error return code in t3_sge_alloc_qset()
    net/x25: prevent a couple of overflows
    ibmvnic: Fix TX completion error handling
    ibmvnic: Ensure that SCRQ entry reads are correctly ordered
    netfilter: bridge: reset skb->pkt_type after NF_INET_POST_ROUTING traversal
  * bonding: wait for sysfs kobject destruction before freeing struct slave
      drivers/net/bonding/bond_main.c
      drivers/net/bonding/bond_sysfs_slave.c
      include/net/bonding.h
    usbnet: ipheth: fix connectivity with iOS 14
    rose: Fix Null pointer dereference in rose_send_frame()
    net/af_iucv: set correct sk_protocol for child sockets
    ANDROID: cuttlefish_defconfig: Disable CONFIG_KSM
    Merge 4.9.247 into android-4.9-q
Linux 4.9.247
  * USB: core: Fix regression in Hercules audio card
      drivers/usb/core/quirks.c
  * USB: core: add endpoint-blacklist quirk
      drivers/usb/core/config.c
      drivers/usb/core/quirks.c
      drivers/usb/core/usb.h
      include/linux/usb/quirks.h
  * regulator: workaround self-referent regulators
      drivers/regulator/core.c
  * regulator: avoid resolve_supply() infinite recursion
      drivers/regulator/core.c
    x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb
    usb: gadget: Fix memleak in gadgetfs_fill_super
  * usb: gadget: f_midi: Fix memleak in f_midi_alloc
      drivers/usb/gadget/function/f_midi.c
  * USB: core: Change %pK for __user pointers to %px
      drivers/usb/core/devio.c
    perf probe: Fix to die_entrypc() returns error correctly
    platform/x86: toshiba_acpi: Fix the wrong variable assignment
    can: gs_usb: fix endianess problem with candleLight firmware
    efivarfs: revert "fix memory leak in efivarfs_create()"
    ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq
    net: ena: set initial DMA width to avoid intel iommu issue
    nfc: s3fwrn5: use signed integer for parsing GPIO numbers
    IB/mthca: fix return value of error branch in mthca_init_cq()
    bnxt_en: Release PCI regions when DMA mask setup fails during probe.
    video: hyperv_fb: Fix the cache type when mapping the VRAM
    bnxt_en: fix error return code in bnxt_init_board()
  * scsi: ufs: Fix race between shutdown and runtime resume flow
      drivers/scsi/ufs/ufshcd.c
    batman-adv: set .owner to THIS_MODULE
    phy: tegra: xusb: Fix dangling pointer on probe failure
    perf/x86: fix sysfs type mismatches
    scsi: target: iscsi: Fix cmd abort fabric stop race
    scsi: libiscsi: Fix NOP race condition
    dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size
  * proc: don't allow async path resolution of /proc/self components
      fs/proc/self.c
    x86/xen: don't unbind uninitialized lock_kicker_irq
    dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant
    HID: hid-sensor-hub: Fix issue with devices with no report ID
    Input: i8042 - allow insmod to succeed on devices without an i8042 controller
  * HID: cypress: Support Varmilo Keyboards' media hotkeys
      drivers/hid/hid-ids.h
    ALSA: hda/hdmi: fix incorrect locking in hdmi_pcm_close
    ALSA: hda/hdmi: Use single mutex unlock in error paths
  * arm64: pgtable: Fix pte_accessible()
      arch/arm64/include/asm/pgtable.h
    btrfs: inode: Verify inode mode to avoid NULL pointer dereference
    btrfs: tree-checker: Enhance chunk checker to validate chunk profile
  * PCI: Add device even if driver attach failed
      drivers/pci/bus.c
    btrfs: fix lockdep splat when reading qgroup config on mount
    mm/userfaultfd: do not access vma->vm_mm after calling handle_userfault()
    perf event: Check ref_reloc_sym before using it
  * BACKPORT: arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
      arch/arm64/include/asm/efi.h
      arch/arm64/include/asm/mmu_context.h
    Merge 4.9.246 into android-4.9-q
Linux 4.9.246
    x86/microcode/intel: Check patch signature before saving microcode for early loading
    s390/cpum_sf.c: fix file permission for cpum_sfb_size
    mac80211: free sta in sta_info_insert_finish() on errors
    mac80211: minstrel: fix tx status processing corner case
    mac80211: minstrel: remove deferred sampling code
    xtensa: disable preemption around cache alias management calls
  * regulator: fix memory leak with repeated set_machine_constraints()
      drivers/regulator/core.c
    iio: accel: kxcjk1013: Replace is_smo8500_device with an acpi_type enum
  * ext4: fix bogus warning in ext4_update_dx_flag()
      fs/ext4/ext4.h
    efivarfs: fix memory leak in efivarfs_create()
    tty: serial: imx: keep console clocks always on
    ALSA: mixart: Fix mutex deadlock
  * ALSA: ctl: fix error path at adding user-defined element set
      sound/core/control.c
    powerpc/uaccess-flush: fix missing includes in kup-radix.h
  * libfs: fix error cast of negative value in simple_attr_write()
      fs/libfs.c
    xfs: revert "xfs: fix rmap key and record comparison functions"
    regulator: ti-abb: Fix array out of bound read access on the first transition
    MIPS: Alchemy: Fix memleak in alchemy_clk_setup_cpu
    can: m_can: m_can_handle_state_change(): fix state change
    can: peak_usb: fix potential integer overflow on shift of a int
    can: dev: can_restart(): post buffer from the right context
    perf lock: Don't free "lock_seq_stat" if read_count isn't zero
    ARM: dts: imx50-evk: Fix the chip select 1 IOMUX
    arm: dts: imx6qdl-udoo: fix rgmii phy-mode for ksz9031 phy
    MIPS: export has_transparent_hugepage() for modules
    Input: adxl34x - clean up a data type in adxl34x_probe()
  * vfs: remove lockdep bogosity in __sb_start_write
      fs/super.c
  * arm64: psci: Avoid printing in cpu_psci_cpu_die()
      arch/arm64/kernel/psci.c
    pinctrl: rockchip: enable gpio pclk for rockchip_gpio_to_irq
    mlxsw: core: Use variable timeout for EMAD retries
    net: ftgmac100: Fix crash when removing driver
    tcp: only postpone PROBE_RTT if RTT is < current min_rtt estimate
    net: usb: qmi_wwan: Set DTR quirk for MR400
    sctp: change to hold/put transport for proto_unreach_timer
    qlcnic: fix error return code in qlcnic_83xx_restart_hw()
    net: x25: Increase refcnt of "struct x25_neigh" in x25_rx_call_request
    net/mlx4_core: Fix init_hca fields offset
  * netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist()
      net/netlabel/netlabel_unlabeled.c
  * netlabel: fix our progress tracking in netlbl_unlabel_staticlist()
      net/netlabel/netlabel_unlabeled.c
    net: Have netpoll bring-up DSA management interface
  * net: bridge: add missing counters to ndo_get_stats64 callback
      net/bridge/br_device.c
    net: b44: fix error return code in b44_init_one()
  * inet_diag: Fix error path to cancel the meseage in inet_req_diag_fill()
      net/ipv4/inet_diag.c
    devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()
    bnxt_en: read EEPROM A2h address using page 0
    atm: nicstar: Unmap DMA on send error
  * ah6: fix error return code in ah6_input()
      net/ipv6/ah6.c
    Merge 4.9.245 into android-4.9-q
Linux 4.9.245
    ACPI: GED: fix -Wformat
    KVM: x86: clflushopt should be treated as a no-op by emulation
    mac80211: always wind down STA state
    Input: sunkbd - avoid use-after-free in teardown paths
    powerpc/8xx: Always fault when _PAGE_ACCESSED is not set
    i2c: mux: pca954x: Add missing pca9546 definition to chip_desc
    i2c: imx: Fix external abort on interrupt in exit paths
    i2c: imx: use clk notifier for rate changes
    powerpc/64s: flush L1D after user accesses
    powerpc/uaccess: Evaluate macro arguments once, before user access is allowed
    powerpc: Fix __clear_user() with KUAP enabled
    powerpc: Implement user_access_begin and friends
    powerpc: Add a framework for user access tracking
    powerpc/64s: flush L1D on kernel entry
    powerpc/64s: move some exception handlers out of line
    powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
Linux 4.9.244
    Convert trailing spaces and periods in path components
  * ext4: fix leaking sysfs kobject after failed mount
      fs/ext4/super.c
  * reboot: fix overflow parsing reboot cpu number
      kernel/reboot.c
  * Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
      kernel/reboot.c
  * perf/core: Fix race in the perf_mmap_close() function
      kernel/events/core.c
    xen/events: block rogue events for some time
    xen/events: defer eoi in case of excessive number of events
    xen/events: use a common cpu hotplug hook for event channels
    xen/events: switch user event channels to lateeoi model
    xen/pciback: use lateeoi irq binding
    xen/scsiback: use lateeoi irq binding
    xen/netback: use lateeoi irq binding
    xen/blkback: use lateeoi irq binding
    xen/events: add a new "late EOI" evtchn framework
    xen/events: fix race in evtchn_fifo_unmask()
    xen/events: add a proper barrier to 2-level uevent unmasking
    xen/events: avoid removing an event channel while handling it
  * perf/core: Fix a memory leak in perf_event_parse_addr_filter()
      kernel/events/core.c
  * perf/core: Fix crash when using HW tracing kernel filters
      kernel/events/core.c
  * perf/core: Fix bad use of igrab()
      include/linux/perf_event.h
      kernel/events/core.c
    x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
  * random32: make prandom_u32() output unpredictable
      drivers/char/random.c
      include/linux/prandom.h
      kernel/time/timer.c
      lib/random32.c
    net: Update window_clamp if SOCK_RCVBUF is set
    net/x25: Fix null-ptr-deref in x25_connect
    net/af_iucv: fix null pointer dereference on shutdown
  * IPv6: Set SIT tunnel hard_header_len to zero
      net/ipv6/sit.c
  * swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
      lib/swiotlb.c
    pinctrl: amd: fix incorrect way to disable debounce filter
    pinctrl: amd: use higher precision for 512 RtcClk
    drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
  * don't dump the threads that had been already exiting when zapped.
      kernel/exit.c
    ocfs2: initialize ip_next_orphan
    mei: protect mei_cl_mtu from null dereference
    usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
  * ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
      fs/ext4/inline.c
  * ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
      fs/ext4/super.c
  * perf: Fix get_recursion_context()
      kernel/events/internal.h
    cosa: Add missing kfree in error path of cosa_write
  * of/address: Fix of_node memory leak in of_dma_is_coherent
      drivers/of/address.c
    xfs: fix a missing unlock on error in xfs_fs_map_blocks
    xfs: fix rmap key and record comparison functions
    xfs: fix flags argument to rmap lookup when converting shared file rmaps
    pinctrl: aspeed: Fix GPI only function problem.
    iommu/amd: Increase interrupt remapping table limit to 512 entries
    scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
  * cfg80211: regulatory: Fix inconsistent format argument
      net/wireless/reg.c
    mac80211: fix use of skb payload instead of header
    drm/amdgpu: perform srbm soft reset always on SDMA resume
    scsi: hpsa: Fix memory leak in hpsa_init_one()
    gfs2: check for live vs. read-only file system in gfs2_fitrim
    gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
    usb: gadget: goku_udc: fix potential crashes in probe
    ath9k_htc: Use appropriate rs_datalen type
    geneve: add transport ports in route lookup for geneve
    i40e: Memory leak in i40e_config_iwarp_qvlist
    i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
    i40e: Wrong truncation from u16 to u8
    i40e: add num_vectors checker in iwarp handler
    i40e: Fix a potential NULL pointer dereference
  * pinctrl: devicetree: Avoid taking direct reference to device name string
      drivers/pinctrl/devicetree.c
    Btrfs: fix missing error return if writeback for extent buffer never started
    xfs: flush new eof page on truncate to avoid post-eof corruption
    can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
    can: peak_usb: add range checking in decode operations
    can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
    can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
    can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
    ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
    perf tools: Add missing swap for ino_generation
  * net: xfrm: fix a race condition during allocing spi
      net/xfrm/xfrm_state.c
  * genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
      kernel/irq/Kconfig
    btrfs: reschedule when cloning lots of extents
  * time: Prevent undefined behaviour in timespec64_to_ns()
      include/linux/time64.h
    mm: mempolicy: fix potential pte_unmap_unlock pte error
    gfs2: Wake up when sd_glock_disposal becomes zero
  * ring-buffer: Fix recursion protection transitions between interrupt context
      kernel/trace/ring_buffer.c
  * regulator: defer probe when trying to get voltage from unresolved supply
      drivers/regulator/core.c
    UPSTREAM: thermal/drivers/hisi: Remove bogus const from function return type
  * UPSTREAM: net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_dev
      net/ipv6/addrconf.c
    UPSTREAM: tee: shm: fix use-after-free via temporarily dropped reference
    UPSTREAM: Documentation: ip-sysctl.txt: document addr_gen_mode
    UPSTREAM: net: crypto set sk to NULL when af_alg_release.
  * UPSTREAM: ipv6: don't auto-add link-local address to lag ports
      net/ipv6/addrconf.c
  * UPSTREAM: ipv6: ndisc: RFC-ietf-6man-ra-pref64-09 is now published as RFC8781
      include/net/ndisc.h
  * UPSTREAM: binder: fix incorrect cmd to binder_stat_br
      drivers/android/binder.c
  * UPSTREAM: arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
      arch/arm64/include/asm/mmu_context.h
    UPSTREAM: staging: android: vsoc: fix copy_from_user overrun
    Merge 4.9.243 into android-4.9-q
Linux 4.9.243
    powercap: restrict energy meter to root access
    Merge 4.9.242 into android-4.9-q
Linux 4.9.242
    Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE"
    ARC: stack unwinding: avoid indefinite looping
  * USB: Add NO_LPM quirk for Kingston flash drive
      drivers/usb/core/quirks.c
    USB: serial: option: add Telit FN980 composition 0x1055
    USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231
    USB: serial: cyberjack: fix write-URB completion race
    serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init
    serial: 8250_mtk: Fix uart_get_baud_rate warning
  * fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent
      kernel/fork.c
  * vt: Disable KD_FONT_OP_COPY
      drivers/tty/vt/vt.c
    ACPI: NFIT: Fix comparison to '-ENXIO'
    vsock: use ns_capable_noaudit() on socket create
  * scsi: core: Don't start concurrent async scan on same host
      drivers/scsi/scsi_scan.c
  * of: Fix reserved-memory overlap detection
      drivers/of/of_reserved_mem.c
    x86/kexec: Use up-to-dated screen_info copy to fill boot params
    ARM: dts: sun4i-a10: fix cpu_alert temperature
  * tracing: Fix out of bounds write in get_trace_buf
      kernel/trace/trace.c
  * ftrace: Handle tracing when switching between context
      kernel/trace/trace.h
  * ftrace: Fix recursion check for NMI test
      kernel/trace/trace.h
  * kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
      kernel/kthread.c
  * ALSA: usb-audio: Add implicit feedback quirk for Qu-16
      sound/usb/pcm.c
    Fonts: Replace discarded const qualifier
    gianfar: Account for Tx PTP timestamp in the skb headroom
    gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP
    tipc: fix use-after-free in tipc_bcast_get_mode
    xen/events: don't use chip_data for legacy IRQs
    staging: octeon: Drop on uncorrectable alignment or FCS error
    staging: octeon: repair "fixed-link" support
    staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice
  * KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
      arch/arm64/include/asm/kvm_host.h
  * device property: Don't clear secondary pointer for shared primary firmware node
      drivers/base/core.c
  * device property: Keep secondary firmware node secondary by type
      drivers/base/core.c
    ARM: s3c24xx: fix missing system reset
    ARM: samsung: fix PM debug build with DEBUG_LL but !MMU
    hil/parisc: Disable HIL driver when it gets stuck
    cachefiles: Handle readpage error correctly
  * arm64: berlin: Select DW_APB_TIMER_OF
      arch/arm64/Kconfig.platforms
  * tty: make FONTX ioctl use the tty pointer they were actually passed
      drivers/tty/vt/vt_ioctl.c
    rtc: rx8010: don't modify the global rtc ops
    vringh: fix __vringh_iov() when riov and wiov are different
  * ring-buffer: Return 0 on success from ring_buffer_resize()
      kernel/trace/ring_buffer.c
    9P: Cast to loff_t before multiplying
    libceph: clear con->out_msg on Policy::stateful_server faults
    ceph: promote to unsigned long long before shifting
    ia64: fix build error with !COREDUMP
    ubi: check kthread_should_stop() after the setting of task state
    ubifs: dent: Fix some potential memory leaks while iterating entries
    powerpc/powernv/elog: Fix race while processing OPAL error log event.
    powerpc: Warn about use of smt_snooze_delay
    iio:gyro:itg3200: Fix timestamp alignment and prevent data leak.
    iio:adc:ti-adc12138 Fix alignment issue with timestamp
    iio:light:si1145: Fix timestamp alignment and prevent data leak.
    dmaengine: dma-jz4780: Fix race in jz4780_dma_tx_status
  * vt: keyboard, extend func_buf_lock to readers
      drivers/tty/vt/keyboard.c
  * vt: keyboard, simplify vt_kdgkbsent
      drivers/tty/vt/keyboard.c
    usb: host: fsl-mph-dr-of: check return of dma_set_mask()
  * usb: dwc3: core: don't trigger runtime pm when remove driver
      drivers/usb/dwc3/core.c
  * usb: dwc3: core: add phy cleanup for probe error handling
      drivers/usb/dwc3/core.c
    btrfs: fix use-after-free on readahead extent after failure to create it
    btrfs: cleanup cow block on error
    btrfs: reschedule if necessary when logging directory items
    scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove()
    w1: mxc_w1: Fix timeout resolution problem leading to bus error
    acpi-cpufreq: Honor _PSD table setting on new AMD CPUs
    ACPI: debug: don't allow debugging when ACPI is disabled
    ACPI: video: use ACPI backlight for HP 635 Notebook
    ACPI / extlog: Check for RDMSR failure
    NFS: fix nfs_path in case of a rename retry
  * fs: Don't invalidate page buffers in block_write_full_page()
      fs/buffer.c
    leds: bcm6328, bcm6358: use devres LED registering function
    perf/x86/amd/ibs: Fix raw sample data accumulation
    perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
    md/raid5: fix oops during stripe resizing
    ARM: dts: s5pv210: remove dedicated 'audio-subsystem' node
    ARM: dts: s5pv210: move PMU node out of clock controller
    ARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings
    memory: emif: Remove bogus debugfs error handling
    gfs2: add validation checks for size of superblock
  * ext4: Detect already used quota file early
      fs/ext4/super.c
    drivers: watchdog: rdc321x_wdt: Fix race condition bugs
    net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid
    clk: ti: clockdomain: fix static checker warning
    md/bitmap: md_bitmap_get_counter returns wrong blocks
    power: supply: test_power: add missing newlines when printing parameters by sysfs
    bus/fsl_mc: Do not rely on caller to provide non NULL mc_io
    drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values
  * arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE
      arch/arm64/include/asm/numa.h
    USB: adutux: fix debugging
    cpufreq: sti-cpufreq: add stih418 support
  * kgdb: Make "kgdbcon" work properly with "kgdb_earlycon"
      kernel/debug/debug_core.c
  * printk: reduce LOG_BUF_SHIFT range for H8300
      init/Kconfig
    mmc: via-sdmmc: Fix data race bug
    media: tw5864: check status of tw5864_frameinterval_get
    ath10k: fix VHT NSS calculation when STBC is enabled
    video: fbdev: pvr2fb: initialize variables
    xfs: fix realtime bitmap/summary file truncation when growing rt volume
    ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses
    um: change sigio_spinlock to a mutex
  * f2fs: fix to check segment boundary during SIT page readahead
      fs/f2fs/checkpoint.c
  * f2fs: add trace exit in exception path
      fs/f2fs/checkpoint.c
    sparc64: remove mm_cpumask clearing to fix kthread_use_mm race
    powerpc/powernv/smp: Fix spurious DBG() warning
    mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish()
  * fscrypt: use EEXIST when file already uses different policy
      fs/crypto/policy.c
  * fscrypto: move ioctl processing more fully into common code
      fs/crypto/policy.c
      fs/ext4/ext4.h
      fs/ext4/ioctl.c
      fs/f2fs/f2fs.h
      fs/f2fs/file.c
  * fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
      fs/crypto/policy.c
      fs/ext4/namei.c
      fs/f2fs/namei.c
    ata: sata_rcar: Fix DMA boundary mask
    mtd: lpddr: Fix bad logic in print_drs_error
    p54: avoid accessing the data mapped to streaming DMA
  * fuse: fix page dereference after free
      fs/fuse/dev.c
    arch/x86/amd/ibs: Fix re-arming IBS Fetch
    tipc: fix memory leak caused by tipc_buf_append()
    ravb: Fix bit fields checking in ravb_hwtstamp_get()
    efivarfs: Replace invalid slashes with exclamation marks in dentries.
    powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler
  * scripts/setlocalversion: make git describe output more reliable
      scripts/setlocalversion
    SUNRPC: ECONNREFUSED should cause a rebind.
  * ANDROID: Temporarily disable XFRM_USER_COMPAT filtering
      net/xfrm/xfrm_state.c
      net/xfrm/xfrm_user.c
  * BACKPORT: xfrm/compat: Translate 32-bit user_policy from sockptr
      include/net/xfrm.h
      net/xfrm/xfrm_state.c
  * BACKPORT: xfrm/compat: Add 32=>64-bit messages translator
      include/net/xfrm.h
      net/xfrm/Kconfig
      net/xfrm/xfrm_user.c
  * UPSTREAM: xfrm/compat: Attach xfrm dumps to 64=>32 bit translator
      net/xfrm/xfrm_user.c
  * BACKPORT: xfrm/compat: Add 64=>32-bit messages translator
      include/net/xfrm.h
      net/xfrm/xfrm_user.c
  * BACKPORT: xfrm: Provide API to register translator module
      include/net/xfrm.h
      net/xfrm/Kconfig
      net/xfrm/Makefile
      net/xfrm/xfrm_state.c
  * UPSTREAM: mm/sl[uo]b: export __kmalloc_track(_node)_caller
      mm/slub.c
    ANDROID: Publish uncompressed Image on aarch64
  * ANDROID: Makefile: append BUILD_NUMBER to version string when defined
      Makefile

Change-Id: I345c9bde484cf008679253982f61b2a833527c3e
Signed-off-by: Lucas Wei <lucaswei@google.com>
2021-01-25 15:50:07 +08:00
Zqiang
a1ffa0673d kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled
commit 6993d0fdbee0eb38bfac350aa016f65ad11ed3b1 upstream.

There is a small race window when a delayed work is being canceled and
the work still might be queued from the timer_fn:

	CPU0						CPU1
kthread_cancel_delayed_work_sync()
   __kthread_cancel_work_sync()
     __kthread_cancel_work()
        work->canceling++;
					      kthread_delayed_work_timer_fn()
						   kthread_insert_work();

BUG: kthread_insert_work() should not get called when work->canceling is
set.

Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201014083030.16895-1-qiang.zhang@windriver.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-10 10:24:02 +01:00
Petri Gynther
a7d848ab42 Merge 4.9.117 into android-msm-bluecross-4.9-lts
Linux 4.9.117
    net: dsa: qca8k: Allow overwriting CPU port setting
    net: dsa: qca8k: Add QCA8334 binding documentation
    net: dsa: qca8k: Enable RXMAC when bringing up a port
    net: dsa: qca8k: Force CPU port to its highest bandwidth
    RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  * ext4: check for allocation block validity with block group locked
      fs/ext4/balloc.c
      fs/ext4/ialloc.c
  * ext4: fix inline data updates with checksums enabled
      fs/ext4/inline.c
      fs/ext4/inode.c
  * squashfs: be more careful about metadata corruption
      fs/squashfs/squashfs_fs.h
  * random: mix rdrand with entropy sent in from userspace
      drivers/char/random.c
  * drm: Add DP PSR2 sink enable bit
      include/drm/drm_dp_helper.h
    media: si470x: fix __be16 annotations
    scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
    scsi: scsi_dh: replace too broad "TP9" string with the exact models
    media: omap3isp: fix unbalanced dma_iommu_mapping
  * crypto: authenc - don't leak pointers to authenc keys
      crypto/authenc.c
  * crypto: authencesn - don't leak pointers to authenc keys
      crypto/authencesn.c
  * usb: hub: Don't wait for connect state at resume for powered-off ports
      drivers/usb/core/hub.c
    microblaze: Fix simpleImage format generation
  * serial: core: Make sure compiler barfs for 16-byte earlycon names
      include/linux/serial_core.h
    staging: lustre: ldlm: free resource when ldlm_lock_create() fails.
    staging: lustre: llite: correct removexattr detection
  * audit: allow not equal op for audit by executable
      kernel/auditfilter.c
    rsi: Fix 'invalid vdd' warning in mmc
  * ipconfig: Correctly initialise ic_nameservers
      net/ipv4/ipconfig.c
    drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
    arm64: defconfig: Enable Rockchip io-domain driver
    memory: tegra: Apply interrupts mask per SoC
    memory: tegra: Do not handle spurious interrupts
  * stop_machine: Use raw spinlocks
      kernel/stop_machine.c
    dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
    dt-bindings: pinctrl: meson: add support for the Meson8m2 SoC
    mmc: pwrseq: Use kmalloc_array instead of stack VLA
    mmc: dw_mmc: update actual clock for mmc debugfs
    ALSA: hda/ca0132: fix build failure when a local macro is defined
  * drm/atomic: Handling the case when setting old crtc for plane
      drivers/gpu/drm/drm_atomic.c
    media: siano: get rid of __le32/__le16 cast warnings
  * bpf: fix references to free_bpf_prog_info() in comments
      kernel/bpf/verifier.c
    thermal: exynos: fix setting rising_threshold for Exynos5433
    staging: lustre: o2iblnd: fix race at kiblnd_connect_peer
    scsi: megaraid: silence a static checker bug
    scsi: 3w-xxxx: fix a missing-check bug
    scsi: 3w-9xxx: fix a missing-check bug
    bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
    perf: fix invalid bit in diagnostic entry
    s390/cpum_sf: Add data entry sizes to sampling trailer entry
    brcmfmac: Add support for bcm43364 wireless chipset
    mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
    media: saa7164: Fix driver name in debug output
  * media: media-device: fix ioctl function types
      drivers/media/media-device.c
    libata: Fix command retry decision
    media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
  * dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
      include/linux/dma-iommu.h
  * tty: Fix data race in tty_insert_flip_string_fixed_flag
      drivers/tty/pty.c
    nvmem: properly handle returned value nvmem_reg_read
    ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node
    ARM: dts: emev2: Add missing interrupt-affinity to PMU node
    EDAC, altera: Fix ARM64 build warning
    HID: i2c-hid: check if device is there before really probing
    powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
    drm/radeon: fix mode_valid's return type
  * HID: hid-plantronics: Re-resend Update to map button for PTT products
      drivers/hid/hid-plantronics.c
  * arm64: cmpwait: Clear event register before arming exclusive monitor
      arch/arm64/include/asm/cmpxchg.h
  * ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
      sound/usb/pcm.c
    net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value
    media: smiapp: fix timeout checking in smiapp_read_nvm
    ixgbevf: fix MAC address changes through ixgbevf_set_mac()
    md: fix NULL dereference of mddev->pers in remove_and_add_spares()
    regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
    ALSA: emu10k1: Rate-limit error messages about page errors
  * scsi: ufs: fix exception event handling
      drivers/scsi/ufs/ufshcd.c
  * fscrypt: use unbound workqueue for decryption
      fs/crypto/crypto.c
    drivers/perf: arm-ccn: don't log to dmesg in event_init
    ima: based on policy verify firmware signatures (pre-allocated buffer)
    mwifiex: correct histogram data with appropriate index
    net: dsa: qca8k: Add support for QCA8334 switch
    PCI: pciehp: Request control of native hotplug only if supported
    bpf: powerpc64: pad function address loads with NOPs
    pinctrl: at91-pio4: add missing of_node_put
    powerpc/8xx: fix invalid register expression in head_8xx.S
    powerpc/powermac: Mark variable x as unused
    powerpc/powermac: Add missing prototype for note_bootable_part()
    powerpc/chrp/time: Make some functions static, add missing header include
    powerpc/32: Add a missing include header
    ath: Add regulatory mapping for Bahamas
    ath: Add regulatory mapping for Bermuda
    ath: Add regulatory mapping for Serbia
    ath: Add regulatory mapping for Tanzania
    ath: Add regulatory mapping for Uganda
    ath: Add regulatory mapping for APL2_FCCA
    ath: Add regulatory mapping for APL13_WORLD
    ath: Add regulatory mapping for ETSI8_WORLD
    ath: Add regulatory mapping for FCC3_ETSIC
  * PCI: Prevent sysfs disable of device while driver is attached
      drivers/pci/pci-sysfs.c
    btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
    btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
  * media: videobuf2-core: don't call memop 'finish' when queueing
      drivers/media/v4l2-core/videobuf2-core.c
    media: tw686x: Fix incorrect vb2_mem_ops GFP flags
    wlcore: sdio: check for valid platform device data before suspend
    mwifiex: handle race during mwifiex_usb_disconnect
    mfd: cros_ec: Fail early if we cannot identify the EC
  * ASoC: dpcm: fix BE dai not hw_free and shutdown
      sound/soc/soc-pcm.c
    Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
    Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
    iwlwifi: pcie: fix race in Rx buffer allocator
    selftests/intel_pstate: Improve test, minor fixes
    perf/x86/intel/uncore: Correct fixed counter index check for NHM
    perf/x86/intel/uncore: Correct fixed counter index check in generic code
    usbip: usbip_detach: Fix memory, udev context and udev leak
  * f2fs: fix race in between GC and atomic open
      fs/f2fs/file.c
  * f2fs: Fix deadlock in shutdown ioctl
      fs/f2fs/file.c
  * f2fs: fix to wait page writeback during revoking atomic write
      fs/f2fs/segment.c
  * f2fs: fix to don't trigger writeback during recovery
      fs/f2fs/segment.c
  * f2fs: fix error path of move_data_page
      fs/f2fs/gc.c
  * disable loading f2fs module on PAGE_SIZE > 4KB
      fs/f2fs/super.c
    pnfs: Don't release the sequence slot until we've processed layoutget on open
    netfilter: nf_tables: check msg_type before nft_trans_set(trans)
    RDMA/mad: Convert BUG_ONs to error flows
    powerpc/64s: Fix compiler store ordering to SLB shadow area
    hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
    powerpc/eeh: Fix use-after-release of EEH driver
    infiniband: fix a possible use-after-free bug
    netfilter: ipset: List timing out entries with "timeout 1" instead of zero
    perf tools: Fix pmu events parsing rule
  * rtc: ensure rtc_set_alarm fails when alarms are not supported
      drivers/rtc/interface.c
  * mm/slub.c: add __printf verification to slab_err()
      mm/slub.c
  * mm: vmalloc: avoid racy handling of debugobjects in vunmap
      mm/vmalloc.c
    vfio: platform: Fix reset module leak in error path
    nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
    ALSA: fm801: add error handling for snd_ctl_add
    ALSA: emu10k1: add error handling for snd_ctl_add
    xen/netfront: raise max number of slots in xennet_get_responses()
    kcov: ensure irq code sees a valid area
    usb: dwc2: Fix DMA alignment to start at allocated boundary
  * arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
      arch/arm64/mm/init.c
  * tracing: Quiet gcc warning about maybe unused link variable
      kernel/trace/trace_kprobe.c
  * tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
      kernel/trace/trace_kprobe.c
  * kthread, tracing: Don't expose half-written comm when creating kthreads
      kernel/kthread.c
  * tracing: Fix possible double free in event_enable_trigger_func()
      kernel/trace/trace_events_trigger.c
  * tracing: Fix double free of event_trigger_data
      kernel/trace/trace_events_trigger.c
    kvm, mm: account shadow page tables to kmemcg
    Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
    Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
    Input: elan_i2c - add ACPI ID for lenovo ideapad 330

Change-Id: Ibdefd19225c51396172426223364ca861da5f5a0
Signed-off-by: Petri Gynther <pgynther@google.com>
2018-09-20 18:54:56 -07:00
Snild Dolkow
b38f8292f0 kthread, tracing: Don't expose half-written comm when creating kthreads
commit 3e536e222f2930534c252c1cc7ae799c725c5ff9 upstream.

There is a window for racing when printing directly to task->comm,
allowing other threads to see a non-terminated string. The vsnprintf
function fills the buffer, counts the truncated chars, then finally
writes the \0 at the end.

	creator                     other
	vsnprintf:
	  fill (not terminated)
	  count the rest            trace_sched_waking(p):
	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
	  write \0

The consequences depend on how 'other' uses the string. In our case,
it was copied into the tracing system's saved cmdlines, a buffer of
adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):

	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"

...and a strcpy out of there would cause stack corruption:

	[224761.522292] Kernel panic - not syncing: stack-protector:
	    Kernel stack is corrupted in: ffffff9bf9783c78

	crash-arm64> kbt | grep 'comm\|trace_print_context'
	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
	      comm (char [16]) =  "irq/497-pwr_even"

	crash-arm64> rd 0xffffffd4d0e17d14 8
	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........

The workaround in e09e28671 (use strlcpy in __trace_find_cmdline) was
likely needed because of this same bug.

Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
This way, there won't be a window where comm is not terminated.

Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com

Cc: stable@vger.kernel.org
Fixes: bc0c38d139 ("ftrace: latency tracer infrastructure")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Snild Dolkow <snild@sony.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:55:12 +02:00
Channagoud Kadabi
ef546abcaf Revert "kthread: Ensure task isn't preempted before dequeue in kthread_parkme"
The right fix to address the issue has been pulled from upstream commit
"<9cd4f1a4e7a8> smp/hotplug: Move unparking of percpu threads to the
control CPU"

Change-Id: I3721ae1b7c8718c35fca1b7c7ef53ce80044d0cf
Signed-off-by: Channagoud Kadabi <ckadabi@codeaurora.org>
2017-07-12 16:20:39 -07:00
Vikram Mulukutla
97955bf5ad kthread: Ensure task isn't preempted before dequeue in kthread_parkme
kthread_park waits for the target thread to park itself with
kthread_parkme using a completion variable. kthread_parkme -
which is invoked by the target thread - sets the completion
variable before calling schedule to get itself off of the
runqueue.

This causes an interesting race in the hotplug path. takedown_cpu
invoked for CPU X attempts to park the cpuhp/X thread before
running the stopper thread on CPU X. There is a guarantee that
the task state of cpuhp/X is set to TASK_PARKED, but there is no
guarantee that it's actually off of the runqueue when kthread_park
returns. takedown_cpu proceeds to run the stopper thread on CPUX
which promptly migrates off the still-on-rq cpuhp/X thread to another
cpu CPUY.

All of this is actually OK - cpuhp/X may finally get itself off
of CPU_Y's runqueue at some later point. However, let's assume
CPU_Y has a rather long running RT task, and cpuhp/X doesn't
actually get to run. Now for whatever reason CPU_X is brought online
again, and an attempt is made to unpark cpuhp/X in cpuhp_online_idle
with preemption disabled. kthread_unpark calls kthread_bind_mask,
which finds that the task still active, leading to a schedule()
call in wait_task_inactive, causing a "scheduling while atomic"
BUG.

Now we can force the hotplug thread to actually wait for smpboot
threads to get off of the runqeue - but this sort of defeats the
lightweight nature of parking for everyone else. Let's simply
ensure that the setting of the completion variable and the schedule()
is atomic. This completely fixes the hotplug versus kthread_parkme
race.

Change-Id: Ia624b07119462911a9d4d367100408f4426cb6f6
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-06-25 01:13:01 -07:00
Kees Cook
a967be8dec time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Change-Id: I66e06ae2d6e32c309824310d3d9bf54d1047eab1
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: dfb4357da6ddbdf57d583ba64361c9d792b0e0b1
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[ohaugan@codeaurora.org: Fixed merge conflicts]
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2017-05-11 13:26:42 -07:00
Tejun Heo
f44236a1b0 cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
commit 77f88796cee819b9c4562b0b6b44691b3b7755b1 upstream.

Creation of a kthread goes through a couple interlocked stages between
the kthread itself and its creator.  Once the new kthread starts
running, it initializes itself and wakes up the creator.  The creator
then can further configure the kthread and then let it start doing its
job by waking it up.

In this configuration-by-creator stage, the creator is the only one
that can wake it up but the kthread is visible to userland.  When
altering the kthread's attributes from userland is allowed, this is
fine; however, for cases where CPU affinity is critical,
kthread_bind() is used to first disable affinity changes from userland
and then set the affinity.  This also prevents the kthread from being
migrated into non-root cgroups as that can affect the CPU affinity and
many other things.

Unfortunately, the cgroup side of protection is racy.  While the
PF_NO_SETAFFINITY flag prevents further migrations, userland can win
the race before the creator sets the flag with kthread_bind() and put
the kthread in a non-root cgroup, which can lead to all sorts of
problems including incorrect CPU affinity and starvation.

This bug got triggered by userland which periodically tries to migrate
all processes in the root cpuset cgroup to a non-root one.  Per-cpu
workqueue workers got caught while being created and ended up with
incorrected CPU affinity breaking concurrency management and sometimes
stalling workqueue execution.

This patch adds task->no_cgroup_migration which disallows the task to
be migrated by userland.  kthreadd starts with the flag set making
every child kthread start in the root cgroup with migration
disallowed.  The flag is cleared after the kthread finishes
initialization by which time PF_NO_SETAFFINITY is set if the kthread
should stay in the root cgroup.

It'd be better to wait for the initialization instead of failing but I
couldn't think of a way of implementing that without adding either a
new PF flag, or sleeping and retrying from waiting side.  Even if
userland depends on changing cgroup membership of a kthread, it either
has to be synchronized with kthread_create() or periodically repeat,
so it's unlikely that this would break anything.

v2: Switch to a simpler implementation using a new task_struct bit
    field suggested by Oleg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-and-debugged-by: Chris Mason <clm@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-21 09:31:18 +02:00
Petr Mladek
dbf52682cb kthread: better support freezable kthread workers
This patch allows to make kthread worker freezable via a new @flags
parameter. It will allow to avoid an init work in some kthreads.

It currently does not affect the function of kthread_worker_fn()
but it might help to do some optimization or fixes eventually.

I currently do not know about any other use for the @flags
parameter but I believe that we will want more flags
in the future.

Finally, I hope that it will not cause confusion with @flags member
in struct kthread. Well, I guess that we will want to rework the
basic kthreads implementation once all kthreads are converted into
kthread workers or workqueues. It is possible that we will merge
the two structures.

Link: http://lkml.kernel.org/r/1470754545-17632-12-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
9a6b06c8d9 kthread: allow to modify delayed kthread work
There are situations when we need to modify the delay of a delayed kthread
work. For example, when the work depends on an event and the initial delay
means a timeout. Then we want to queue the work immediately when the event
happens.

This patch implements kthread_mod_delayed_work() as inspired workqueues.
It cancels the timer, removes the work from any worker list and queues it
again with the given timeout.

A very special case is when the work is being canceled at the same time.
It might happen because of the regular kthread_cancel_delayed_work_sync()
or by another kthread_mod_delayed_work(). In this case, we do nothing and
let the other operation win. This should not normally happen as the caller
is supposed to synchronize these operations a reasonable way.

Link: http://lkml.kernel.org/r/1470754545-17632-11-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
37be45d49d kthread: allow to cancel kthread work
We are going to use kthread workers more widely and sometimes we will need
to make sure that the work is neither pending nor running.

This patch implements cancel_*_sync() operations as inspired by
workqueues.  Well, we are synchronized against the other operations via
the worker lock, we use del_timer_sync() and a counter to count parallel
cancel operations.  Therefore the implementation might be easier.

First, we check if a worker is assigned.  If not, the work has newer been
queued after it was initialized.

Second, we take the worker lock.  It must be the right one.  The work must
not be assigned to another worker unless it is initialized in between.

Third, we try to cancel the timer when it exists.  The timer is deleted
synchronously to make sure that the timer call back is not running.  We
need to temporary release the worker->lock to avoid a possible deadlock
with the callback.  In the meantime, we set work->canceling counter to
avoid any queuing.

Fourth, we try to remove the work from a worker list. It might be
the list of either normal or delayed works.

Fifth, if the work is running, we call kthread_flush_work().  It might
take an arbitrary time.  We need to release the worker-lock again.  In the
meantime, we again block any queuing by the canceling counter.

As already mentioned, the check for a pending kthread work is done under a
lock.  In compare with workqueues, we do not need to fight for a single
PENDING bit to block other operations.  Therefore we do not suffer from
the thundering storm problem and all parallel canceling jobs might use
kthread_flush_work().  Any queuing is blocked until the counter gets zero.

Link: http://lkml.kernel.org/r/1470754545-17632-10-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
22597dc3d9 kthread: initial support for delayed kthread work
We are going to use kthread_worker more widely and delayed works
will be pretty useful.

The implementation is inspired by workqueues.  It uses a timer to queue
the work after the requested delay.  If the delay is zero, the work is
queued immediately.

In compare with workqueues, each work is associated with a single worker
(kthread).  Therefore the implementation could be much easier.  In
particular, we use the worker->lock to synchronize all the operations with
the work.  We do not need any atomic operation with a flags variable.

In fact, we do not need any state variable at all.  Instead, we add a list
of delayed works into the worker.  Then the pending work is listed either
in the list of queued or delayed works.  And the existing check of pending
works is the same even for the delayed ones.

A work must not be assigned to another worker unless reinitialized.
Therefore the timer handler might expect that dwork->work->worker is valid
and it could simply take the lock.  We just add some sanity checks to help
with debugging a potential misuse.

Link: http://lkml.kernel.org/r/1470754545-17632-9-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
8197b3d43b kthread: detect when a kthread work is used by more workers
Nothing currently prevents a work from queuing for a kthread worker when
it is already running on another one.  This means that the work might run
in parallel on more than one worker.  Also some operations are not
reliable, e.g.  flush.

This problem will be even more visible after we add kthread_cancel_work()
function.  It will only have "work" as the parameter and will use
worker->lock to synchronize with others.

Well, normally this is not a problem because the API users are sane.
But bugs might happen and users also might be crazy.

This patch adds a warning when we try to insert the work for another
worker.  It does not fully prevent the misuse because it would make the
code much more complicated without a big benefit.

It adds the same warning also into kthread_flush_work() instead of the
repeated attempts to get the right lock.

A side effect is that one needs to explicitly reinitialize the work if it
must be queued into another worker.  This is needed, for example, when the
worker is stopped and started again.  It is a bit inconvenient.  But it
looks like a good compromise between the stability and complexity.

I have double checked all existing users of the kthread worker API and
they all seems to initialize the work after the worker gets started.

Just for completeness, the patch adds a check that the work is not already
in a queue.

The patch also puts all the checks into a separate function.  It will be
reused when implementing delayed works.

Link: http://lkml.kernel.org/r/1470754545-17632-8-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
35033fe9cb kthread: add kthread_destroy_worker()
The current kthread worker users call flush() and stop() explicitly.
This function does the same plus it frees the kthread_worker struct
in one call.

It is supposed to be used together with kthread_create_worker*() that
allocates struct kthread_worker.

Link: http://lkml.kernel.org/r/1470754545-17632-7-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
fbae2d44aa kthread: add kthread_create_worker*()
Kthread workers are currently created using the classic kthread API,
namely kthread_run().  kthread_worker_fn() is passed as the @threadfn
parameter.

This patch defines kthread_create_worker() and
kthread_create_worker_on_cpu() functions that hide implementation details.

They enforce using kthread_worker_fn() for the main thread.  But I doubt
that there are any plans to create any alternative.  In fact, I think that
we do not want any alternative main thread because it would be hard to
support consistency with the rest of the kthread worker API.

The naming and function of kthread_create_worker() is inspired by the
workqueues API like the rest of the kthread worker API.

The kthread_create_worker_on_cpu() variant is motivated by the original
kthread_create_on_cpu().  Note that we need to bind per-CPU kthread
workers already when they are created.  It makes the life easier.
kthread_bind() could not be used later for an already running worker.

This patch does _not_ convert existing kthread workers.  The kthread
worker API need more improvements first, e.g.  a function to destroy the
worker.

IMPORTANT:

kthread_create_worker_on_cpu() allows to use any format of the worker
name, in compare with kthread_create_on_cpu().  The good thing is that it
is more generic.  The bad thing is that most users will need to pass the
cpu number in two parameters, e.g.  kthread_create_worker_on_cpu(cpu,
"helper/%d", cpu).

To be honest, the main motivation was to avoid the need for an empty
va_list.  The only legal way was to create a helper function that would be
called with an empty list.  Other attempts caused compilation warnings or
even errors on different architectures.

There were also other alternatives, for example, using #define or
splitting __kthread_create_worker().  The used solution looked like the
least ugly.

Link: http://lkml.kernel.org/r/1470754545-17632-6-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
255451e453 kthread: allow to call __kthread_create_on_node() with va_list args
kthread_create_on_node() implements a bunch of logic to create the
kthread.  It is already called by kthread_create_on_cpu().

We are going to extend the kthread worker API and will need to call
kthread_create_on_node() with va_list args there.

This patch does only a refactoring and does not modify the existing
behavior.

Link: http://lkml.kernel.org/r/1470754545-17632-5-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
a65d40961d kthread/smpboot: do not park in kthread_create_on_cpu()
kthread_create_on_cpu() was added by the commit 2a1d446019
("kthread: Implement park/unpark facility").  It is currently used only
when enabling new CPU.  For this purpose, the newly created kthread has to
be parked.

The CPU binding is a bit tricky.  The kthread is parked when the CPU has
not been allowed yet.  And the CPU is bound when the kthread is unparked.

The function would be useful for more per-CPU kthreads, e.g.
bnx2fc_thread, fcoethread.  For this purpose, the newly created kthread
should stay in the uninterruptible state.

This patch moves the parking into smpboot.  It binds the thread already
when created.  Then the function might be used universally.  Also the
behavior is consistent with kthread_create() and kthread_create_on_node().

Link: http://lkml.kernel.org/r/1470754545-17632-4-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
3989144f86 kthread: kthread worker API cleanup
A good practice is to prefix the names of functions by the name
of the subsystem.

The kthread worker API is a mix of classic kthreads and workqueues.  Each
worker has a dedicated kthread.  It runs a generic function that process
queued works.  It is implemented as part of the kthread subsystem.

This patch renames the existing kthread worker API to use
the corresponding name from the workqueues API prefixed by
kthread_:

__init_kthread_worker()		-> __kthread_init_worker()
init_kthread_worker()		-> kthread_init_worker()
init_kthread_work()		-> kthread_init_work()
insert_kthread_work()		-> kthread_insert_work()
queue_kthread_work()		-> kthread_queue_work()
flush_kthread_work()		-> kthread_flush_work()
flush_kthread_worker()		-> kthread_flush_worker()

Note that the names of DEFINE_KTHREAD_WORK*() macros stay
as they are. It is common that the "DEFINE_" prefix has
precedence over the subsystem names.

Note that INIT() macros and init() functions use different
naming scheme. There is no good solution. There are several
reasons for this solution:

  + "init" in the function names stands for the verb "initialize"
    aka "initialize worker". While "INIT" in the macro names
    stands for the noun "INITIALIZER" aka "worker initializer".

  + INIT() macros are used only in DEFINE() macros

  + init() functions are used close to the other kthread()
    functions. It looks much better if all the functions
    use the same scheme.

  + There will be also kthread_destroy_worker() that will
    be used close to kthread_cancel_work(). It is related
    to the init() function. Again it looks better if all
    functions use the same naming scheme.

  + there are several precedents for such init() function
    names, e.g. amd_iommu_init_device(), free_area_init_node(),
    jump_label_init_type(),  regmap_init_mmio_clk(),

  + It is not an argument but it was inconsistent even before.

[arnd@arndb.de: fix linux-next merge conflict]
 Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Petr Mladek
e700591ae0 kthread: rename probe_kthread_data() to kthread_probe_data()
Patch series "kthread: Kthread worker API improvements"

The intention of this patchset is to make it easier to manipulate and
maintain kthreads.  Especially, I want to replace all the custom main
cycles with a generic one.  Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.

This patch (of 11):

A good practice is to prefix the names of functions by the name of the
subsystem.

This patch fixes the name of probe_kthread_data().  The other wrong
functions names are part of the kthread worker API and will be fixed
separately.

Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00
Oleg Nesterov
23196f2e5f kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
get_task_struct(tsk) no longer pins tsk->stack so all users of
to_live_kthread() should do try_get_task_stack/put_task_stack to protect
"struct kthread" which lives on kthread's stack.

TODO: Kill to_live_kthread(), perhaps we can even kill "struct kthread" too,
and rework kthread_stop(), it can use task_work_add() to sync with the exiting
kernel thread.

Message-Id: <20160629180357.GA7178@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cb9b16bbc19d4aea4507ab0552e4644c1211d130.1474003868.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-16 09:18:53 +02:00
Andrew Morton
e9f069868d kernel/kthread.c:kthread_create_on_node(): clarify documentation
- Make it clear that the `node' arg refers to memory allocations only:
  kthread_create_on_node() does not pin the new thread to that node's
  CPUs.

- Encourage the use of NUMA_NO_NODE.

[nzimmer@sgi.com: use NUMA_NO_NODE in kthread_create() also]
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
a1d8561172 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest change in this cycle is the rewrite of the main SMP load
  balancing metric: the CPU load/utilization.  The main goal was to make
  the metric more precise and more representative - see the changelog of
  this commit for the gory details:

    9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")

  It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

  and the performance testing results are encouraging.  Nevertheless we
  need to keep an eye on potential regressions, since this potentially
  affects every SMP workload in existence.

  This work comes from Yuyang Du.

  Other changes:

   - SCHED_DL updates.  (Andrea Parri)

   - Simplify architecture callbacks by removing finish_arch_switch().
     (Peter Zijlstra et al)

   - cputime accounting: guarantee stime + utime == rtime.  (Peter
     Zijlstra)

   - optimize idle CPU wakeups some more - inspired by Facebook server
     loads.  (Mike Galbraith)

   - stop_machine fixes and updates.  (Oleg Nesterov)

   - Introduce the 'trace_sched_waking' tracepoint.  (Peter Zijlstra)

   - sched/numa tweaks.  (Srikar Dronamraju)

   - misc fixes and small cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched/deadline: Fix comment in enqueue_task_dl()
  sched/deadline: Fix comment in push_dl_tasks()
  sched: Change the sched_class::set_cpus_allowed() calling context
  sched: Make sched_class::set_cpus_allowed() unconditional
  sched: Fix a race between __kthread_bind() and sched_setaffinity()
  sched: Ensure a task has a non-normalized vruntime when returning back to CFS
  sched/numa: Fix NUMA_DIRECT topology identification
  tile: Reorganize _switch_to()
  sched, sparc32: Update scheduler comments in copy_thread()
  sched: Remove finish_arch_switch()
  sched, tile: Remove finish_arch_switch
  sched, sh: Fold finish_arch_switch() into switch_to()
  sched, score: Remove finish_arch_switch()
  sched, avr32: Remove finish_arch_switch()
  sched, MIPS: Get rid of finish_arch_switch()
  sched, arm: Remove finish_arch_switch()
  sched/fair: Clean up load average references
  sched/fair: Provide runnable_load_avg back to cfs_rq
  sched/fair: Remove task and group entity load when they are dead
  sched/fair: Init cfs_rq's sched_entity load average
  ...
2015-08-31 20:26:22 -07:00
Peter Zijlstra
25834c73f9 sched: Fix a race between __kthread_bind() and sched_setaffinity()
Because sched_setscheduler() checks p->flags & PF_NO_SETAFFINITY
without locks, a caller might observe an old value and race with the
set_cpus_allowed_ptr() call from __kthread_bind() and effectively undo
it:

	__kthread_bind()
	  do_set_cpus_allowed()
						<SYSCALL>
						  sched_setaffinity()
						    if (p->flags & PF_NO_SETAFFINITIY)
						    set_cpus_allowed_ptr()
	  p->flags |= PF_NO_SETAFFINITY

Fix the bug by putting everything under the regular scheduler locks.

This also closes a hole in the serialization of task_struct::{nr_,}cpus_allowed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dedekind1@gmail.com
Cc: juri.lelli@arm.com
Cc: mgorman@suse.de
Cc: riel@redhat.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20150515154833.545640346@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 12:06:09 +02:00
David Kershner
18896451ea kthread: export kthread functions
The s-Par visornic driver, currently in staging, processes a queue being
serviced by the an s-Par service partition.  We can get a message that
something has happened with the Service Partition, when that happens, we
must not access the channel until we get a message that the service
partition is back again.

The visornic driver has a thread for processing the channel, when we get
the message, we need to be able to park the thread and then resume it
when the problem clears.

We can do this with kthread_park and unpark but they are not exported
from the kernel, this patch exports the needed functions.

Signed-off-by: David Kershner <david.kershner@unisys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Nishanth Aravamudan
109228389a kernel/kthread.c: partial revert of 81c98869fa ("kthread: ensure locality of task_struct allocations")
After discussions with Tejun, we don't want to spread the use of
cpu_to_mem() (and thus knowledge of allocators/NUMA topology details) into
callers, but would rather ensure the callees correctly handle memoryless
nodes.  With the previous patches ("topology: add support for
node_to_mem_node() to determine the fallback node" and "slub: fallback to
node_to_mem_node() node if allocating on memoryless node") adding and
using node_to_mem_node(), we can safely undo part of the change to the
kthread logic from 81c98869fa.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09 22:25:51 -04:00
Lai Jiangshan
ed1403ec2b kthread_work: wake up worker only when the worker is idle
If the worker is already executing a work item when another is queued,
we can safely skip wakeup without worrying about stalling queue thus
avoiding waking up the busy worker spuriously.  Spurious wakeups
should be fine but still isn't nice and avoiding it is trivial here.

tj: Updated description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-28 14:07:52 -04:00
Tetsuo Handa
8fe6929cfd kthread: fix return value of kthread_create() upon SIGKILL.
Commit 786235eeba ("kthread: make kthread_create() killable") meant
for allowing kthread_create() to abort as soon as killed by the
OOM-killer.  But returning -ENOMEM is wrong if killed by SIGKILL from
userspace.  Change kthread_create() to return -EINTR upon SIGKILL.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:51 -07:00
Nishanth Aravamudan
81c98869fa kthread: ensure locality of task_struct allocations
In the presence of memoryless nodes, numa_node_id() will return the
current CPU's NUMA node, but that may not be where we expect to allocate
from memory from.  Instead, we should rely on the fallback code in the
memory allocator itself, by using NUMA_NO_NODE.  Also, when calling
kthread_create_on_node(), use the nearest node with memory to the cpu in
question, rather than the node it is running on.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:49 -07:00
Tetsuo Handa
786235eeba kthread: make kthread_create() killable
Any user process callers of wait_for_completion() except global init
process might be chosen by the OOM killer while waiting for completion()
call by some other process which does memory allocation.  See
CVE-2012-4398 "kernel: request_module() OOM local DoS" can happen.

When such users are chosen by the OOM killer when they are waiting for
completion() in TASK_UNINTERRUPTIBLE, the system will be kept stressed
due to memory starvation because the OOM killer cannot kill such users.

kthread_create() is one of such users and this patch fixes the problem
for kthreadd by making kthread_create() killable - the same approach
used for fixing CVE-2012-4398.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:08:59 +09:00
Tejun Heo
cd42d559e4 kthread: implement probe_kthread_data()
One of the problems that arise when converting dedicated custom threadpool
to workqueue is that the shared worker pool used by workqueue anonimizes
each worker making it more difficult to identify what the worker was doing
on which target from the output of sysrq-t or debug dump from oops, BUG()
and friends.

For example, after writeback is converted to use workqueue instead of
priviate thread pool, there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, which,
according to our writeback brethren, is important in tracking down issues
with a lot of mounted file systems on a lot of different devices.

This patchset implements a way for a work function to mark its execution
instance so that task dump of the worker task includes information to
indicate what the work item was doing.

An example WARN dump would look like the following.

 WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
 Modules linked in:
 CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 Workqueue: writeback bdi_writeback_workfn (flush-8:16)
  ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
  ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
  ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
 Call Trace:
  [<ffffffff81c61855>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  ...

This patch:

Implement probe_kthread_data() which returns kthread_data if accessible.
The function is equivalent to kthread_data() except that the specified
@task may not be a kthread or its vfork_done is already cleared rendering
struct kthread inaccessible.  In the former case, probe_kthread_data() may
return any value.  In the latter, NULL.

This will be used to safely print debug information without affecting
synchronization in the normal paths.  Workqueue debug info printing on
dump_stack() and friends will make use of it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Linus Torvalds
46d9be3e5e Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "A lot of activities on workqueue side this time.  The changes achieve
  the followings.

   - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
     updated to be able to interface with multiple backend worker pools.
     This involved a lot of churning but the end result seems actually
     neater as unbound workqueues are now a lot closer to per-cpu ones.

   - The ability to interface with multiple backend worker pools are
     used to implement unbound workqueues with custom attributes.
     Currently the supported attributes are the nice level and CPU
     affinity.  It may be expanded to include cgroup association in
     future.  The attributes can be specified either by calling
     apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
     the workqueue in question is exported through sysfs.

     The backend worker pools are keyed by the actual attributes and
     shared by any workqueues which share the same attributes.  When
     attributes of a workqueue are changed, the workqueue binds to the
     worker pool with the specified attributes while leaving the work
     items which are already executing in its previous worker pools
     alone.

     This allows converting custom worker pool implementations which
     want worker attribute tuning to use workqueues.  The writeback pool
     is already converted in block tree and there are a couple others
     are likely to follow including btrfs io workers.

   - WQ_UNBOUND's ability to bind to multiple worker pools is also used
     to make it NUMA-aware.  Because there's no association between work
     item issuer and the specific worker assigned to execute it, before
     this change, using unbound workqueue led to unnecessary cross-node
     bouncing and it couldn't be helped by autonuma as it requires tasks
     to have implicit node affinity and workers are assigned randomly.

     After these changes, an unbound workqueue now binds to multiple
     NUMA-affine worker pools so that queued work items are executed in
     the same node.  This is turned on by default but can be disabled
     system-wide or for individual workqueues.

     Crypto was requesting NUMA affinity as encrypting data across
     different nodes can contribute noticeable overhead and doing it
     per-cpu was too limiting for certain cases and IO throughput could
     be bottlenecked by one CPU being fully occupied while others have
     idle cycles.

  While the new features required a lot of changes including
  restructuring locking, it didn't complicate the execution paths much.
  The unbound workqueue handling is now closer to per-cpu ones and the
  new features are implemented by simply associating a workqueue with
  different sets of backend worker pools without changing queue,
  execution or flush paths.

  As such, even though the amount of change is very high, I feel
  relatively safe in that it isn't likely to cause subtle issues with
  basic correctness of work item execution and handling.  If something
  is wrong, it's likely to show up as being associated with worker pools
  with the wrong attributes or OOPS while workqueue attributes are being
  changed or during CPU hotplug.

  While this creates more backend worker pools, it doesn't add too many
  more workers unless, of course, there are many workqueues with unique
  combinations of attributes.  Assuming everything else is the same,
  NUMA awareness costs an extra worker pool per NUMA node with online
  CPUs.

  There are also a couple things which are being routed outside the
  workqueue tree.

   - block tree pulled in workqueue for-3.10 so that writeback worker
     pool can be converted to unbound workqueue with sysfs control
     exposed.  This simplifies the code, makes writeback workers
     NUMA-aware and allows tuning nice level and CPU affinity via sysfs.

   - The conversion to workqueue means that there's no 1:1 association
     between a specific worker, which makes writeback folks unhappy as
     they want to be able to tell which filesystem caused a problem from
     backtrace on systems with many filesystems mounted.  This is
     resolved by allowing work items to set debug info string which is
     printed when the task is dumped.  As this change involves unifying
     implementations of dump_stack() and friends in arch codes, it's
     being routed through Andrew's -mm tree."

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
  workqueue: use kmem_cache_free() instead of kfree()
  workqueue: avoid false negative WARN_ON() in destroy_workqueue()
  workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
  workqueue: implement NUMA affinity for unbound workqueues
  workqueue: introduce put_pwq_unlocked()
  workqueue: introduce numa_pwq_tbl_install()
  workqueue: use NUMA-aware allocation for pool_workqueues
  workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
  workqueue: map an unbound workqueues to multiple per-node pool_workqueues
  workqueue: move hot fields of workqueue_struct to the end
  workqueue: make workqueue->name[] fixed len
  workqueue: add workqueue->unbound_attrs
  workqueue: determine NUMA node of workers accourding to the allowed cpumask
  workqueue: drop 'H' from kworker names of unbound worker pools
  workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
  workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
  workqueue: fix memory leak in apply_workqueue_attrs()
  workqueue: fix unbound workqueue attrs hashing / comparison
  workqueue: fix race condition in unbound workqueue free path
  workqueue: remove pwq_lock which is no longer used
  ...
2013-04-29 19:07:40 -07:00
Oleg Nesterov
b5c5442bb6 kthread: kill task_get_live_kthread()
task_get_live_kthread() looks confusing and unneeded.  It does
get_task_struct() but only kthread_stop() needs this, it can be called
even if the calller doesn't have a reference when we know that this
kthread can't exit until we do kthread_stop().

kthread_park() and kthread_unpark() do not need get_task_struct(), the
callers already have the reference.  And it can not help if we can race
with the exiting kthread anyway, kthread_park() can hang forever in this
case.

Change kthread_park() and kthread_unpark() to use to_live_kthread(),
change kthread_stop() to do get_task_struct() by hand and remove
task_get_live_kthread().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:25 -07:00
Oleg Nesterov
4ecdafc808 kthread: introduce to_live_kthread()
"k->vfork_done != NULL" with a barrier() after to_kthread(k) in
task_get_live_kthread(k) looks unclear, and sub-optimal because we load
->vfork_done twice.

All we need is to ensure that we do not return to_kthread(NULL).  Add a
new trivial helper which loads/checks ->vfork_done once, this also looks
more understandable.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:25 -07:00
Thomas Gleixner
f2530dc71c kthread: Prevent unpark race which puts threads on the wrong cpu
The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().

Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Reported-and-tested-by: Borislav Petkov <bp@alien8.de>
Acked-by: Peter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-12 14:18:43 +02:00
Tejun Heo
14a40ffccd sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY
PF_THREAD_BOUND was originally used to mark kernel threads which were
bound to a specific CPU using kthread_bind() and a task with the flag
set allows cpus_allowed modifications only to itself.  Workqueue is
currently abusing it to prevent userland from meddling with
cpus_allowed of workqueue workers.

What we need is a flag to prevent userland from messing with
cpus_allowed of certain kernel tasks.  In kernel, anyone can
(incorrectly) squash the flag, and, for worker-type usages,
restricting cpus_allowed modification to the task itself doesn't
provide meaningful extra proection as other tasks can inject work
items to the task anyway.

This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY.
sched_setaffinity() checks the flag and return -EINVAL if set.
set_cpus_allowed_ptr() is no longer affected by the flag.

This will allow simplifying workqueue worker CPU affinity management.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2013-03-19 13:45:20 -07:00
Lai Jiangshan
aee4faa499 kthread: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Linus Torvalds
4e21fc138b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull third pile of kernel_execve() patches from Al Viro:
 "The last bits of infrastructure for kernel_thread() et.al., with
  alpha/arm/x86 use of those.  Plus sanitizing the asm glue and
  do_notify_resume() on alpha, fixing the "disabled irq while running
  task_work stuff" breakage there.

  At that point the rest of kernel_thread/kernel_execve/sys_execve work
  can be done independently for different architectures.  The only
  pending bits that do depend on having all architectures converted are
  restrictred to fs/* and kernel/* - that'll obviously have to wait for
  the next cycle.

  I thought we'd have to wait for all of them done before we start
  eliminating the longjump-style insanity in kernel_execve(), but it
  turned out there's a very simple way to do that without flagday-style
  changes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to saner kernel_execve() semantics
  arm: switch to saner kernel_execve() semantics
  x86, um: convert to saner kernel_execve() semantics
  infrastructure for saner ret_from_kernel_thread semantics
  make sure that kernel_thread() callbacks call do_exit() themselves
  make sure that we always have a return path from kernel_execve()
  ppc: eeh_event should just use kthread_run()
  don't bother with kernel_thread/kernel_execve for launching linuxrc
  alpha: get rid of switch_stack argument of do_work_pending()
  alpha: don't bother passing switch_stack separately from regs
  alpha: take SIGPENDING/NOTIFY_RESUME loop into signal.c
  alpha: simplify TIF_NEED_RESCHED handling
2012-10-13 10:05:52 +09:00
Al Viro
a74fb73c12 infrastructure for saner ret_from_kernel_thread semantics
* allow kernel_execve() leave the actual return to userland to
caller (selected by CONFIG_GENERIC_KERNEL_EXECVE).  Callers
updated accordingly.
* architecture that does select GENERIC_KERNEL_EXECVE in its
Kconfig should have its ret_from_kernel_thread() do this:
	call schedule_tail
	call the callback left for it by copy_thread(); if it ever
returns, that's because it has just done successful kernel_execve()
	jump to return from syscall
IOW, its only difference from ret_from_fork() is that it does call the
callback.
* such an architecture should also get rid of ret_from_kernel_execve()
and __ARCH_WANT_KERNEL_EXECVE

This is the last part of infrastructure patches in that area - from
that point on work on different architectures can live independently.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 13:35:07 -04:00
Thomas Gleixner
2a1d446019 kthread: Implement park/unpark facility
To avoid the full teardown/setup of per cpu kthreads in the case of
cpu hot(un)plug, provide a facility which allows to put the kthread
into a park position and unpark it when the cpu comes online again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120716103948.236618824@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-13 17:01:06 +02:00
Tejun Heo
46f3d97621 kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed
kthread_worker provides minimalistic workqueue-like interface for
users which need a dedicated worker thread (e.g. for realtime
priority).  It has basic queue, flush_work, flush_worker operations
which mostly match the workqueue counterparts; however, due to the way
flush_work() is implemented, it has a noticeable difference of not
allowing work items to be freed while being executed.

While the current users of kthread_worker are okay with the current
behavior, the restriction does impede some valid use cases.  Also,
removing this difference isn't difficult and actually makes the code
easier to understand.

This patch reimplements flush_kthread_work() such that it uses a
flush_work item instead of queue/done sequence numbers.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-07-22 10:15:28 -07:00
Tejun Heo
9a2e03d8ed kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation
Make the following two non-functional changes.

* Separate out insert_kthread_work() from queue_kthread_work().

* Relocate struct kthread_flush_work and kthread_flush_work_fn()
  definitions above flush_kthread_work().

v2: Added lockdep_assert_held() in insert_kthread_work() as suggested
    by Andy Walls.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Andy Walls <awalls@md.metrocast.net>
2012-07-22 10:11:01 -07:00
Tejun Heo
34b087e483 freezer: kill unused set_freezable_with_signal()
There's no in-kernel user of set_freezable_with_signal() left.  Mixing
TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
kernel threads never travel signal delivery path on their own.

e.g. the current implementation is buggy in the cancelation path of
__thaw_task().  It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state.  This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.

This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer.  User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
2011-11-23 09:28:17 -08:00
Tejun Heo
8a32c441c1 freezer: implement and use kthread_freezable_should_stop()
Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked.  Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads.  They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21 12:32:23 -08:00
Paul Gortmaker
9984de1a5a kernel: Map most files to use export.h instead of module.h
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include <linux/module.h>
  +#include <linux/export.h>

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 09:20:12 -04:00
KOSAKI Motohiro
1e1b6c511d cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed
The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
tsk->cpus_allowed. Otherwise RT scheduler may confuse.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-28 17:02:57 +02:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Eric Dumazet
207205a2ba kthread: NUMA aware kthread_create_on_node()
All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.

This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter
to parameters already used by kthread_create().

This parameter serves in allocating memory for the new kthread on its
memory node if possible.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:44:01 -07:00