99 Commits

Author SHA1 Message Date
Peter Zijlstra
46f188051f stop_machine: Reflow cpu_stop_queue_two_works()
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by
lifting the preempt_disable() to the top to create more natural nesting wrt
the spinlocks and make the wake_up_q() and preempt_enable() unconditional
at the end.

Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait
with preemption enabled.

Change-Id: Id5f803b3223c7e0a37cf11f613f997d644ac375a
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: isaacm@codeaurora.org
Cc: matt@codeblueprint.co.uk
Cc: psodagud@codeaurora.org
Cc: gregkh@linuxfoundation.org
Cc: pkondeti@codeaurora.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180730112140.GH2494@hirez.programming.kicks-ass.net
Git-commit: b80a2bfce85e1051056d98d04ecb2d0b55cbbc1c
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-09-10 09:32:36 -07:00
Isaac J. Manjarres
b2c8463039 Merge android-4.14-p.61 (b7e55e8) into msm-4.14
* remotes/origin/tmp-b7e55e8:
  Linux 4.14.61
  scsi: sg: fix minor memory leak in error path
  drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
  crypto: padlock-aes - Fix Nano workaround data corruption
  RDMA/uverbs: Expand primary and alt AV port checks
  iwlwifi: add more card IDs for 9000 series
  userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
  audit: fix potential null dereference 'context->module.name'
  kvm: x86: vmx: fix vpid leak
  x86/entry/64: Remove %ebx handling from error_entry/exit
  x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
  virtio_balloon: fix another race between migration and ballooning
  net: socket: fix potential spectre v1 gadget in socketcall
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  squashfs: more metadata hardenings
  squashfs: more metadata hardening
  net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
  rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
  net: stmmac: Fix WoL for PCI-based setups
  netlink: Fix spectre v1 gadget in netlink_create()
  net: dsa: Do not suspend/resume closed slave_dev
  ipv4: frags: handle possible skb truesize change
  inet: frag: enforce memory limits earlier
  bonding: avoid lockdep confusion in bond_get_stats()
  Linux 4.14.60
  tcp: add one more quick ack after after ECN events
  tcp: refactor tcp_ecn_check_ce to remove sk type cast
  tcp: do not aggressively quick ack after ECN events
  tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
  tcp: do not force quickack when receiving out-of-order packets
  netlink: Don't shift with UB on nlk->ngroups
  netlink: Do not subscribe to non-existent groups
  xen-netfront: wait xenbus state change when load module manually
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  NET: stmmac: align DMA stuff to largest cache line length
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  net: lan78xx: fix rx handling before first packet is send
  net: fix amd-xgbe flow-control issue
  net: ena: Fix use of uninitialized DMA address bits field
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  net: dsa: qca8k: Allow overwriting CPU port setting
  net: dsa: qca8k: Add QCA8334 binding documentation
  net: dsa: qca8k: Enable RXMAC when bringing up a port
  net: dsa: qca8k: Force CPU port to its highest bandwidth
  RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  usb: gadget: udc: renesas_usb3: should remove debugfs
  ovl: Sync upper dirty data when syncing overlayfs
  PCI: xgene: Remove leftover pci_scan_child_bus() call
  PCI: pciehp: Assume NoCompl+ for Thunderbolt ports
  ext4: fix check to prevent initializing reserved inodes
  ext4: check for allocation block validity with block group locked
  ext4: fix inline data updates with checksums enabled
  squashfs: be more careful about metadata corruption
  random: mix rdrand with entropy sent in from userspace
  block: reset bi_iter.bi_done after splitting bio
  blkdev: __blkdev_direct_IO_simple: fix leak in error case
  block: bio_iov_iter_get_pages: fix size of last iovec
  drm/dp/mst: Fix off-by-one typo when dump payload table
  drm/atomic-helper: Drop plane->fb references only for drm_atomic_helper_shutdown()
  drm: Add DP PSR2 sink enable bit
  ASoC: topology: Add missing clock gating parameter when parsing hw_configs
  ASoC: topology: Fix bclk and fsync inversion in set_link_hw_format()
  media: si470x: fix __be16 annotations
  media: atomisp: compat32: fix __user annotations
  scsi: cxlflash: Avoid clobbering context control register value
  scsi: cxlflash: Synchronize reset and remove ops
  scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
  scsi: scsi_dh: replace too broad "TP9" string with the exact models
  regulator: Don't return or expect -errno from of_map_mode()
  media: omap3isp: fix unbalanced dma_iommu_mapping
  crypto: authenc - don't leak pointers to authenc keys
  crypto: authencesn - don't leak pointers to authenc keys
  usb: hub: Don't wait for connect state at resume for powered-off ports
  microblaze: Fix simpleImage format generation
  soc: imx: gpcv2: Do not pass static memory as platform data
  serial: core: Make sure compiler barfs for 16-byte earlycon names
  staging: lustre: ldlm: free resource when ldlm_lock_create() fails.
  staging: lustre: llite: correct removexattr detection
  staging: vchiq_core: Fix missing semaphore release in error case
  audit: allow not equal op for audit by executable
  rsi: fix nommu_map_sg overflow kernel panic
  rsi: Fix 'invalid vdd' warning in mmc
  ipconfig: Correctly initialise ic_nameservers
  drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
  igb: Fix queue selection on MAC filters on i210
  arm64: defconfig: Enable Rockchip io-domain driver
  nvme: lightnvm: add granby support
  memory: tegra: Apply interrupts mask per SoC
  memory: tegra: Do not handle spurious interrupts
  delayacct: Use raw_spinlocks
  stop_machine: Use raw spinlocks
  backlight: pwm_bl: Don't use GPIOF_* with gpiod_get_direction
  dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
  net: hns3: Fixes the out of bounds access in hclge_map_tqp
  spi: meson-spicc: Fix error handling in meson_spicc_probe()
  dt-bindings: pinctrl: meson: add support for the Meson8m2 SoC
  mmc: pwrseq: Use kmalloc_array instead of stack VLA
  mmc: dw_mmc: update actual clock for mmc debugfs
  ALSA: hda/ca0132: fix build failure when a local macro is defined
  drm/atomic: Handling the case when setting old crtc for plane
  media: siano: get rid of __le32/__le16 cast warnings
  f2fs: avoid fsync() failure caused by EAGAIN in writepage()
  bpf: fix references to free_bpf_prog_info() in comments
  thermal: exynos: fix setting rising_threshold for Exynos5433
  staging: lustre: o2iblnd: Fix FastReg map/unmap for MLX5
  staging: lustre: o2iblnd: fix race at kiblnd_connect_peer
  scsi: qedf: Set the UNLOADING flag when removing a vport
  scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw
  scsi: megaraid: silence a static checker bug
  scsi: 3w-xxxx: fix a missing-check bug
  scsi: 3w-9xxx: fix a missing-check bug
  bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
  perf: fix invalid bit in diagnostic entry
  s390/cpum_sf: Add data entry sizes to sampling trailer entry
  brcmfmac: Add support for bcm43364 wireless chipset
  mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
  media: saa7164: Fix driver name in debug output
  media: media-device: fix ioctl function types
  ACPI / LPSS: Only call pwm_add_table() for Bay Trail PWM if PMIC HRV is 2
  libata: Fix command retry decision
  media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
  net: phy: phylink: Release link GPIO
  dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
  tty: Fix data race in tty_insert_flip_string_fixed_flag
  i40e: free the skb after clearing the bitlock
  nvmem: properly handle returned value nvmem_reg_read
  ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node
  ARM: dts: emev2: Add missing interrupt-affinity to PMU node
  ARM: dts: stih407-pinctrl: Fix complain about IRQ_TYPE_NONE usage
  EDAC, altera: Fix ARM64 build warning
  HID: i2c-hid: check if device is there before really probing
  powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
  drm/amdgpu: Remove VRAM from shared bo domains.
  drm/radeon: fix mode_valid's return type
  arm64: dts: renesas: salvator-common: use audio-graph-card for Sound
  HID: hid-plantronics: Re-resend Update to map button for PTT products
  arm64: cmpwait: Clear event register before arming exclusive monitor
  media: atomisp: ov2680: don't declare unused vars
  ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
  net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value
  media: smiapp: fix timeout checking in smiapp_read_nvm
  ixgbevf: fix MAC address changes through ixgbevf_set_mac()
  md: fix NULL dereference of mddev->pers in remove_and_add_spares()
  md/raid1: add error handling of read error from FailFast device
  regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
  ALSA: emu10k1: Rate-limit error messages about page errors
  rtc: tps65910: fix possible race condition
  rtc: vr41xx: fix possible race condition
  rtc: tps6586x: fix possible race condition
  Bluetooth: btusb: add ID for LiteOn 04ca:301a
  drm/nouveau/fifo/gk104-: poll for runlist update completion
  scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger
  scsi: ufs: fix exception event handling
  scsi: ufs: ufshcd: fix possible unclocked register access
  fscrypt: use unbound workqueue for decryption
  net: hns3: Fix the missing client list node initialization
  spi: Add missing pm_runtime_put_noidle() after failed get
  drivers/perf: arm-ccn: don't log to dmesg in event_init
  ima: based on policy verify firmware signatures (pre-allocated buffer)
  mwifiex: correct histogram data with appropriate index
  net: dsa: qca8k: Add support for QCA8334 switch
  PCI: pciehp: Request control of native hotplug only if supported
  bpf: powerpc64: pad function address loads with NOPs
  pinctrl: at91-pio4: add missing of_node_put
  powerpc/8xx: fix invalid register expression in head_8xx.S
  spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC
  powerpc: Add __printf verification to prom_printf
  powerpc/powermac: Mark variable x as unused
  powerpc/powermac: Add missing prototype for note_bootable_part()
  powerpc/chrp/time: Make some functions static, add missing header include
  powerpc/32: Add a missing include header
  ath: Add regulatory mapping for Bahamas
  ath: Add regulatory mapping for Bermuda
  ath: Add regulatory mapping for Serbia
  ath: Add regulatory mapping for Tanzania
  ath: Add regulatory mapping for Uganda
  ath: Add regulatory mapping for APL2_FCCA
  ath: Add regulatory mapping for APL13_WORLD
  ath: Add regulatory mapping for ETSI8_WORLD
  ath: Add regulatory mapping for FCC3_ETSIC
  nvme-pci: Fix AER reset handling
  nvme-rdma: stop admin queue before freeing it
  PCI: Prevent sysfs disable of device while driver is attached
  PM / wakeup: Make s2idle_lock a RAW_SPINLOCK
  x86/microcode: Make the late update update_lock a raw lock for RT
  btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
  btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
  Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  Btrfs: don't return ino to ino cache if inode item removal fails
  media: videobuf2-core: don't call memop 'finish' when queueing
  media: tw686x: Fix incorrect vb2_mem_ops GFP flags
  net: hns3: Fixes the init of the VALID BD info in the descriptor
  wlcore: sdio: check for valid platform device data before suspend
  mwifiex: handle race during mwifiex_usb_disconnect
  mfd: cros_ec: Fail early if we cannot identify the EC
  ASoC: dpcm: fix BE dai not hw_free and shutdown
  Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
  Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
  iwlwifi: pcie: fix race in Rx buffer allocator
  btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
  PCI: Fix devm_pci_alloc_host_bridge() memory leak
  selftests: intel_pstate: return Kselftest Skip code for skipped tests
  selftests: memfd: return Kselftest Skip code for skipped tests
  selftests/intel_pstate: Improve test, minor fixes
  perf/x86/intel/uncore: Correct fixed counter index check for NHM
  perf/x86/intel/uncore: Correct fixed counter index check in generic code
  usbip: dynamically allocate idev by nports found in sysfs
  usbip: usbip_detach: Fix memory, udev context and udev leak
  block, bfq: remove wrong lock in bfq_requests_merged
  f2fs: fix race in between GC and atomic open
  f2fs: fix to detect failure of dquot_initialize
  f2fs: Fix deadlock in shutdown ioctl
  f2fs: fix to wait page writeback during revoking atomic write
  f2fs: fix to don't trigger writeback during recovery
  f2fs: fix error path of move_data_page
  disable loading f2fs module on PAGE_SIZE > 4KB
  pnfs: Don't release the sequence slot until we've processed layoutget on open
  netfilter: nf_tables: check msg_type before nft_trans_set(trans)
  lightnvm: pblk: warn in case of corrupted write buffer
  RDMA/mad: Convert BUG_ONs to error flows
  powerpc/64s: Fix compiler store ordering to SLB shadow area
  hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/64s: Add barrier_nospec
  powerpc/lib: Adjust .balign inside string functions for PPC32
  infiniband: fix a possible use-after-free bug
  e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  ceph: fix alignment of rasize
  bpf, arm32: fix inconsistent naming about emit_a32_lsr_{r64,i64}
  printk: drop in_nmi check from printk_safe_flush_on_panic()
  watchdog: da9063: Fix updating timeout value
  irqchip/ls-scfg-msi: Map MSIs in the iommu
  netfilter: ipset: List timing out entries with "timeout 1" instead of zero
  netfilter: ipset: forbid family for hash:mac sets
  perf tools: Fix pmu events parsing rule
  rtc: ensure rtc_set_alarm fails when alarms are not supported
  mm/slub.c: add __printf verification to slab_err()
  mm: vmalloc: avoid racy handling of debugobjects in vunmap
  mm: /proc/pid/pagemap: hide swap entries from unprivileged users
  kernel/hung_task.c: show all hung tasks before panic
  vfio/type1: Fix task tracking for QEMU vCPU hotplug
  vfio/mdev: Check globally for duplicate devices
  vfio: platform: Fix reset module leak in error path
  nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
  NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
  ALSA: fm801: add error handling for snd_ctl_add
  ALSA: emu10k1: add error handling for snd_ctl_add
  skip LAYOUTRETURN if layout is invalid
  hv_netvsc: fix network namespace issues with VF support
  xen/netfront: raise max number of slots in xennet_get_responses()
  kcov: ensure irq code sees a valid area
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
  tracing: Quiet gcc warning about maybe unused link variable
  tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
  kthread, tracing: Don't expose half-written comm when creating kthreads
  tracing: Fix possible double free in event_enable_trigger_func()
  tracing: Fix double free of event_trigger_data
  delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
  kvm, mm: account shadow page tables to kmemcg
  Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
  Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
  Input: elan_i2c - add ACPI ID for lenovo ideapad 330
  spi: spi-s3c64xx: Fix system resume support
  drivers/infiniband/ulp/srpt/ib_srpt.c: fix build with gcc-4.4.4
  IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()
  drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4
  RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access
  i2c: core: decrease reference count of device node in i2c_unregister_device
  fork: unconditionally clear stack on fork
  Linux 4.14.59
  turn off -Wattribute-alias
  can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode
  can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  can: xilinx_can: fix device dropping off bus on RX overrun
  can: xilinx_can: fix recovery from error states not being propagated
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
  driver core: Partially revert "driver core: correct device's shutdown order"
  usb: gadget: f_fs: Only return delayed status when len is 0
  usb: dwc2: Fix DMA alignment to start at allocated boundary
  usb: core: handle hub C_PORT_OVER_CURRENT condition
  usb: cdc_acm: Add quirk for Castles VEGA3000
  staging: speakup: fix wraparound in uaccess length check
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  tcp: do not delay ACK in DCTCP upon CE status change
  tcp: do not cancel delay-AcK on DCTCP special ACK
  tcp: helpers to send special DCTCP ack
  tcp: fix dctcp delayed ACK schedule
  vxlan: fix default fdb entry netlink notify ordering during netdev create
  vxlan: make netlink notify in vxlan_fdb_destroy optional
  vxlan: add new fdb alloc and create helpers
  rtnetlink: add rtnl_link_state check in rtnl_configure_link
  sock: fix sg page frag coalescing in sk_alloc_sg
  net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
  multicast: do not restore deleted record source filter mode to new one
  net/ipv6: Fix linklocal to global address with VRF
  net/mlx5e: Fix quota counting in aRFS expire flow
  net/mlx5e: Don't allow aRFS for encapsulated packets
  net/mlx5: Adjust clock overflow work period
  net: skb_segment() should not return NULL
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  ip: hash fragments consistently
  bonding: set default miimon value for non-arp modes if not set
  drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
  drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  xen/PVH: Set up GS segment for stack canary
  MIPS: Fix off-by-one in pci_resource_to_user()
  MIPS: ath79: fix register address in ath79_ddr_wb_flush()
  Revert "cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting"
  ANDROID: verity: really fix android-verity Kconfig
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  x86_64_cuttlefish_defconfig: Enable android-verity
  x86_64_cuttlefish_defconfig: enable verity cert
  ANDROID: android-verity: Fix broken parameter handling.
  ANDROID: android-verity: Make it work with newer kernels
  ANDROID: android-verity: Add API to verify signature with builtin keys.
  ANDROID: verity: fix android-verity Kconfig dependencies
  Linux 4.14.58
  xhci: Fix perceived dead host due to runtime suspend race with event handler
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  cxl_getfile(): fix double-iput() on alloc_file() failures
  alpha: fix osf_wait4() breakage
  net: usb: asix: replace mii_nway_restart in resume path
  ipv6: make DAD fail with enhanced DAD when nonce length differs
  net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite
  net/mlx4_en: Don't reuse RX page when XDP is set
  hv_netvsc: Fix napi reschedule while receive completion is busy
  tg3: Add higher cpu clock for 5762.
  qmi_wwan: add support for Quectel EG91
  ptp: fix missing break in switch
  net: phy: fix flag masking in __set_phy_supported
  net/ipv4: Set oif in fib_compute_spec_dst
  skbuff: Unconditionally copy pfmemalloc in __skb_clone()
  net: Don't copy pfmemalloc flag in __copy_skb_header()
  net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
  lib/rhashtable: consider param->min_size when setting initial table size
  ipv6: ila: select CONFIG_DST_CACHE
  ipv6: fix useless rol32 call on hash
  ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
  gen_stats: Fix netlink stats dumping in the presence of padding
  drm/nouveau: Avoid looping through fake MST connectors
  drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
  drm/i915: Fix hotplug irq ack on i965/g4x
  stop_machine: Disable preemption when waking two stopper threads
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  vfio/pci: Fix potential Spectre v1
  cpufreq: intel_pstate: Register when ACPI PCCH is present
  mm/huge_memory.c: fix data loss when splitting a file pmd
  mm: memcg: fix use after free in mem_cgroup_iter()
  ARC: mm: allow mprotect to make stack mappings executable
  ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
  ARC: Fix CONFIG_SWAP
  ARCv2: [plat-hsdk]: Save accl reg pair by default
  ALSA: hda: add mute led support for HP ProBook 455 G5
  ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
  ALSA: rawmidi: Change resized buffers atomically
  fat: fix memory allocation failure handling of match_strdup()
  x86/MCE: Remove min interval polling limitation
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
  x86/apm: Don't access __preempt_count with zeroed fs
  KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
  scsi: sd_zbc: Fix variable type and bogus comment
  ANDROID: uid_sys_stats: Replace tasklist lock with RCU in uid_cputime_show
  Linux 4.14.57
  string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
  arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
  arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
  arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
  arm64: KVM: Add HYP per-cpu accessors
  arm64: ssbd: Add prctl interface for per-thread mitigation
  arm64: ssbd: Introduce thread flag to control userspace mitigation
  arm64: ssbd: Restore mitigation status on CPU resume
  arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
  arm64: ssbd: Add global mitigation state accessor
  arm64: Add 'ssbd' command-line option
  arm64: Add ARCH_WORKAROUND_2 probing
  arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
  arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
  arm/arm64: smccc: Add SMCCC-specific return codes
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state
  arm64: alternatives: Add dynamic patching feature
  KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
  arm64: alternatives: use tpidr_el2 on VHE hosts
  KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
  KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
  KVM: arm64: Store vcpu on the stack during __guest_enter()
  net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
  rds: avoid unenecessary cong_update in loop transport
  bdi: Fix another oops in wb_workfn()
  netfilter: ipv6: nf_defrag: drop skb dst before queueing
  nsh: set mac len based on inner packet
  autofs: fix slab out of bounds read in getname_kernel()
  tls: Stricter error checking in zerocopy sendmsg path
  KEYS: DNS: fix parsing multiple options
  reiserfs: fix buffer overflow with long warning messages
  netfilter: ebtables: reject non-bridge targets
  PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
  block: do not use interruptible wait anywhere
  mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally
  crypto: af_alg - Initialize sg_num_bytes in error code path
  clocksource: Initialize cs->wd_list
  media: rc: oops in ir_timer_keyup after device unplug
  xhci: Fix USB3 NULL pointer dereference at logical disconnect.
  net: lan78xx: Fix race in tx pending skb size calculation
  rtlwifi: rtl8821ae: fix firmware is not ready to run
  rtlwifi: Fix kernel Oops "Fw download fail!!"
  net: cxgb3_main: fix potential Spectre v1
  VSOCK: fix loopback on big-endian systems
  vhost_net: validate sock before trying to put its fd
  tcp: prevent bogus FRTO undos with non-SACK flows
  tcp: fix Fast Open key endianness
  strparser: Remove early eaten to fix full tcp receive buffer stall
  stmmac: fix DMA channel hang in half-duplex mode
  r8152: napi hangup fix after disconnect
  qmi_wwan: add support for the Dell Wireless 5821e module
  qed: Limit msix vectors in kdump kernel to the minimum required count.
  qed: Fix use of incorrect size in memcpy call.
  qed: Fix setting of incorrect eswitch mode.
  qede: Adverstise software timestamp caps when PHC is not available.
  net/tcp: Fix socket lookups with SO_BINDTODEVICE
  net: sungem: fix rx checksum support
  net_sched: blackhole: tell upper qdisc about dropped packets
  net/packet: fix use-after-free
  net: mvneta: fix the Rx desc DMA address in the Rx path
  net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
  net/mlx5: Fix required capability for manipulating MPFS
  net/mlx5: Fix incorrect raw command length parsing
  net/mlx5: Fix command interface race in polling mode
  net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
  net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
  net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
  net: macb: Fix ptp time adjustment for large negative delta
  net: fix use-after-free in GRO with ESP
  net: dccp: switch rx_tstamp_last_feedback to monotonic clock
  net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
  ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
  ipvlan: fix IFLA_MTU ignored on NEWLINK
  ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
  hv_netvsc: split sub-channel setup into async and sync
  atm: zatm: Fix potential Spectre v1
  atm: Preserve value of skb->truesize when accounting to vcc
  alx: take rtnl before calling __alx_open from resume
  crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
  crypto: crypto4xx - remove bad list_del
  PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference
  bcm63xx_enet: do not write to random DMA channel on BCM6345
  bcm63xx_enet: correct clock usage
  ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
  ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
  xprtrdma: Fix corner cases when handling device removal
  cpufreq / CPPC: Set platform specific transition_delay_us
  Btrfs: fix duplicate extents after fsync of file with prealloc extents
  x86/paravirt: Make native_save_fl() extern inline
  x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
  compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
  ANDROID: Add hold functionality to schedtune CPU boost
  ANDROID: sched/rt: Add schedtune accounting to rt task enqueue/dequeue
  UPSTREAM: cpuidle: menu: Avoid selecting shallow states with stopped tick
  UPSTREAM: cpuidle: menu: Refine idle state selection for running tick
  UPSTREAM: sched: idle: Select idle state before stopping the tick
  BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()
  BACKPORT: time: tick-sched: Split tick_nohz_stop_sched_tick()
  UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()
  UPSTREAM: jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  UPSTREAM: sched: idle: Do not stop the tick before cpuidle_idle_call()
  BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop
  BACKPORT: time: tick-sched: Reorganize idle tick management code
  ANDROID: sched/fair: fix a warning
  ANDROID: sched/walt: Fix compilation issue for x86_64
  ANDROID: mnt: Fix next_descendent
  ANDROID: sched/events: Introduce util_est trace events
  ANDROID: sched/fair: schedtune: update before schedutil
  FROMLIST: sched/fair: add support to tune PELT ramp/decay timings
  BACKPORT: sched/fair: Update util_est before updating schedutil
  BACKPORT: sched/fair: Update util_est only on util_avg updates
  BACKPORT: sched/fair: Use util_est in LB and WU paths
  BACKPORT: sched/fair: Add util_est on top of PELT
  ANDROID: sched/fair: Cleanup cpu_util{_wake}()
  ANDROID: sched: Update max cpu capacity in case of max frequency constraints
  ANDROID: arm: enable max frequency capping
  ANDROID: arm64: enable max frequency capping
  ANDROID: implement max frequency capping
  ANDROID: sched/fair: add arch scaling function for max frequency capping
  ANDROID: trace: Add WALT util signal to trace event sched_load_cfs_rq
  ANDROID: sched, trace: Remove trace event sched_load_avg_cpu
  ANDROID: Rename and move include/linux/sched_energy.h
  ANDROID: Adjust juno energy model
  ANDROID: Check equality of max cap state cap and cpu scale
  ANDROID: Move energy model init call into arch_topology driver
  ANDROID: Streamline sched_domain_energy_f functions
  ANDROID: Separate cpu_scale and energy model setup
  ANDROID: update_group_capacity for single cpu in cluster
  ANDROID: sched/fair: return idle CPU immediately for prefer_idle
  ANDROID: sched/fair: add idle state filter to prefer_idle case
  ANDROID: sched/fair: remove order from CPU selection
  ANDROID: sched/fair: unify spare capacity calculation
  ANDROID:sched/fair: prefer energy efficient CPUs for !prefer_idle tasks
  ANDROID: sched/fair: fix CPU selection for non latency sensitive tasks
  ANDROID: sched/fair: Also do misfit in overloaded groups
  ANDROID: sched/fair: Don't balance misfits if it would overload local group
  ANDROID: sched/fair: Attempt to improve throughput for asym cap systems
  FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary
  FROMLIST: sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains
  FROMLIST: sched/core: Disable SD_ASYM_CPUCAPACITY for root_domains without asymmetry
  FROMLIST: sched/fair: Set rq->rd->overload when misfit
  FROMLIST: sched: Wrap rq->rd->overload accesses with READ/WRITE_ONCE
  FROMLIST: sched: Change root_domain->overload type to int
  FROMLIST: sched/fair: Change prefer_sibling type to bool
  FROMLIST: sched/fair: Consider misfit tasks when load-balancing
  FROMLIST: sched: Add sched_group per-cpu max capacity
  FROMLIST: sched/fair: Add group_misfit_task load-balance type
  FROMLIST: sched: Add static_key for asymmetric cpu capacity optimizations
  UPSTREAM: ANDROID: binder: change down_write to down_read
  UPSTREAM: ANDROID: binder: correct the cmd print for BINDER_WORK_RETURN_ERROR
  UPSTREAM: ANDROID: binder: remove 32-bit binder interface.
  UPSTREAM: android: binder: Use true and false for boolean values
  UPSTREAM: android: binder: Use octal permissions
  UPSTREAM: android: binder: Prefer __func__ to using hardcoded function name
  UPSTREAM: ANDROID: binder: make binder_alloc_new_buf_locked static and indent its arguments
  UPSTREAM: android: binder: Check for errors in binder_alloc_shrinker_init().

Conflicts:
	arch/arm64/Kconfig
	arch/arm64/include/asm/cpucaps.h
	arch/arm64/include/asm/cpufeature.h
	arch/arm64/include/asm/thread_info.h
	arch/arm64/kernel/cpu_errata.c
	arch/arm64/kernel/cpufeature.c
	arch/arm64/kernel/entry.S
	arch/arm64/kernel/ssbd.c
	drivers/base/arch_topology.c
	drivers/md/Kconfig
	drivers/scsi/ufs/ufshcd.c
	drivers/usb/gadget/function/f_fs.c
	include/trace/events/sched.h
	kernel/sched/cpufreq_schedutil.c
	kernel/sched/energy.c
	kernel/sched/fair.c
	kernel/sched/features.h
	kernel/sched/sched.h
	kernel/sched/topology.c
	kernel/sched/tune.c
	kernel/sched/walt.c
	kernel/sched/walt.h
	kernel/stop_machine.c
	kernel/time/tick-sched.c
	net/socket.c
	sound/core/rawmidi.c

Change-Id: Ia246711317930ecd55bb42565a04e6b4fdfc26d2
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-09 11:57:44 -07:00
Prasad Sodagudi
739725c32a stop_machine: Atomically queue and wake stopper threads
When cpu_stop_queue_work() releases the lock for the stopper
thread that was queued into its wake queue, preemption is
enabled, which leads to the following deadlock:

CPU0                              CPU1
sched_setaffinity(0, ...)
__set_cpus_allowed_ptr()
stop_one_cpu(0, ...)              stop_two_cpus(0, 1, ...)
cpu_stop_queue_work(0, ...)       cpu_stop_queue_two_works(0, ..., 1, ...)

-grabs lock for migration/0-
                                  -spins with preemption disabled,
                                   waiting for migration/0's lock to be
                                   released-

-adds work items for migration/0
and queues migration/0 to its
wake_q-

-releases lock for migration/0
 and preemption is enabled-

-current thread is preempted,
and __set_cpus_allowed_ptr
has changed the thread's
cpu allowed mask to CPU1 only-

                                  -acquires migration/0 and migration/1's
                                   locks-

                                  -adds work for migration/0 but does not
                                   add migration/0 to wake_q, since it is
                                   already in a wake_q-

                                  -adds work for migration/1 and adds
                                   migration/1 to its wake_q-

                                  -releases migration/0 and migration/1's
                                   locks, wakes migration/1, and enables
                                   preemption-

                                  -since migration/1 is requested to run,
                                   migration/1 begins to run and waits on
                                   migration/0, but migration/0 will never
                                   be able to run, since the thread that
                                   can wake it is affine to CPU1-

Disable preemption in cpu_stop_queue_work() before
queueing works for stopper threads, and queueing the stopper
thread in the wake queue, to ensure that the operation
of queueing the works and waking the stopper threads is atomic.

Change-Id: Iac8ae8d823db2c62191cf93629876f505cb09e77
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-03 14:00:24 -07:00
Thomas Gleixner
da2b62c740 stop_machine: Use raw spinlocks
[ Upstream commit de5b55c1d4e30740009864eb35ce4ed856aac01d ]

Use raw-locks in stop_machine() to allow locking in irq-off and
preempt-disabled regions on -RT. This also documents the possible locking
context in general.

[bigeasy: update patch description.]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20180423191635.6014-1-bigeasy@linutronix.de
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-03 07:50:38 +02:00
Isaac J. Manjarres
d21fb63010 stop_machine: Disable preemption when waking two stopper threads
commit 9fb8d5dc4b649dd190e1af4ead670753e71bf907 upstream.

When cpu_stop_queue_two_works() begins to wake the stopper threads, it does
so without preemption disabled, which leads to the following race
condition:

The source CPU calls cpu_stop_queue_two_works(), with cpu1 as the source
CPU, and cpu2 as the destination CPU. When adding the stopper threads to
the wake queue used in this function, the source CPU stopper thread is
added first, and the destination CPU stopper thread is added last.

When wake_up_q() is invoked to wake the stopper threads, the threads are
woken up in the order that they are queued in, so the source CPU's stopper
thread is woken up first, and it preempts the thread running on the source
CPU.

The stopper thread will then execute on the source CPU, disable preemption,
and begin executing multi_cpu_stop(), and wait for an ack from the
destination CPU's stopper thread, with preemption still disabled. Since the
worker thread that woke up the stopper thread on the source CPU is affine
to the source CPU, and preemption is disabled on the source CPU, that
thread will never run to dequeue the destination CPU's stopper thread from
the wake queue, and thus, the destination CPU's stopper thread will never
run, causing the source CPU's stopper thread to wait forever, and stall.

Disable preemption when waking the stopper threads in
cpu_stop_queue_two_works().

Fixes: 0b26351b910f ("stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock")
Co-Developed-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-Developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: matt@codeblueprint.co.uk
Cc: bigeasy@linutronix.de
Cc: gregkh@linuxfoundation.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1530655334-4601-1-git-send-email-isaacm@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-25 11:25:08 +02:00
Isaac J. Manjarres
c286a9e83b stop_machine: Disable preemption after queueing stopper threads
After cpu_stop_queue_two_works() queues the cpu_stop works
for the stopper threads, it releases the locks held for
both threads, which enables preemption, which allows the
following race condition to occur:

On one CPU, call it CPU 3, thread 1 invokes
cpu_stop_queue_two_works(2, 3,...), and the execution is such
that thread 1 queues the works for migration/2 and migration/3,
and is preempted after releasing the locks for migration/2 and
migration/3, but before waking the threads.

Then, On CPU 2, a kworker, call it thread 2, is running,
and it invokes cpu_stop_queue_two_works(1, 2,...), such that
thread 2 queues the works for migration/1 and migration/2.
Meanwhile, on CPU 3, thread 1 resumes execution, and wakes
migration/2 and migration/3. This means that when CPU 2
releases the locks for migration/1 and migration/2, but before
it wakes those threads, it can be preempted by migration/2.

If thread 2 is preempted by migration/2, then migration/2 will
execute the first work item successfully, since migration/3
was woken up by CPU 3, but when it goes to execute the second
work item, it disables preemption, calls multi_cpu_stop(),
and thus, CPU 2 will wait forever for migration/1, which should
have been woken up by thread 2. However migration/1 cannot be
woken up by thread 2, since it is a kworker, so it is affine to
CPU 2, but CPU 2 is running migration/2 with preemption
disabled, so thread 2 will never run.

Disable preemption after queueing works for stopper threads
to ensure that the operation of queueing the works and waking
the stopper threads is atomic.

Change-Id: I6e9e6997f6e27b5a0fcf85eb472f738467bc2595
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-07-17 08:17:35 -07:00
Isaac J. Manjarres
bbea3fef30 Merge android-4.14.51 (a51b40c) into msm-4.14
* remotes/origin/tmp-a51b40c:
  Linux 4.14.51
  tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
  Btrfs: make raid6 rebuild retry more
  Btrfs: fix scrub to repair raid6 corruption
  Revert "Btrfs: fix scrub to repair raid6 corruption"
  ARM: kexec: fix kdump register saving on panic()
  ARM: 8758/1: decompressor: restore r1 and r2 just before jumping to the kernel
  ARM: 8753/1: decompressor: add a missing parameter to the addruart macro
  efi/libstub/arm64: Handle randomized TEXT_OFFSET
  parisc: Move setup_profiling_timer() out of init section
  sched/deadline: Make the grub_reclaim() function static
  sched/debug: Move the print_rt_rq() and print_dl_rq() declarations to kernel/sched/sched.h
  drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
  locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
  locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
  clk: imx6ull: use OSC clock during AXI rate change
  ARM: davinci: board-dm646x-evm: set VPIF capture card name
  ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
  ARM: davinci: dm646x: fix timer interrupt generation
  i2c: viperboard: return message count on master_xfer success
  i2c: pmcmsp: fix error return from master_xfer
  i2c: pmcmsp: return message count on master_xfer success
  ARM: keystone: fix platform_domain_notifier array overrun
  usb: musb: fix remote wakeup racing with suspend
  afs: Fix the non-encryption of calls
  mtd: Fix comparison in map_word_andequal()
  x86/pkeys/selftests: Add a test for pkey 0
  x86/pkeys/selftests: Save off 'prot' for allocations
  x86/pkeys/selftests: Fix pointer math
  x86/pkeys/selftests: Fix pkey exhaustion test off-by-one
  x86/pkeys/selftests: Add PROT_EXEC test
  x86/pkeys/selftests: Factor out "instruction page"
  x86/pkeys/selftests: Allow faults on unknown keys
  x86/pkeys/selftests: Remove dead debugging code, fix dprint_in_signal
  x86/pkeys/selftests: Stop using assert()
  x86/pkeys/selftests: Give better unexpected fault error messages
  x86/selftests: Add mov_to_ss test
  x86/mpx/selftests: Adjust the self-test to fresh distros that export the MPX ABI
  x86/pkeys/selftests: Adjust the self-test to fresh distros that export the pkeys ABI
  objtool, kprobes/x86: Sync the latest <asm/insn.h> header with tools/objtool/arch/x86/include/asm/insn.h
  uprobes/x86: Prohibit probing on MOV SS instruction
  kprobes/x86: Prohibit probing on exception masking instructions
  ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
  proc/kcore: don't bounds check against address 0
  init: fix false positives in W+X checking
  net sched actions: fix invalid pointer dereferencing if skbedit flags missing
  ixgbe: return error on unsupported SFP module when resetting
  x86: Delay skip of emulated hypercall instruction
  KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Fix error reception on AF_INET6 sockets
  qede: Fix gfp flags sent to rdma event node allocation
  qed: Fix l2 initializations over iWARP personality
  tipc: eliminate KMSAN uninit-value in strcmp complaint
  agp: uninorth: make two functions static
  cifs: smb2ops: Fix listxattr() when there are no EAs
  arm64: Add MIDR encoding for NVIDIA CPUs
  can: dev: increase bus-off message severity
  net: aquantia: driver should correctly declare vlan_features bits
  x86/xen: Reset VCPU0 info pointer after shared_info remap
  mac80211: use timeout from the AddBA response instead of the request
  ARM: dts: cygnus: fix irq type for arm global timer
  driver core: add __printf verification to __ata_ehi_pushv_desc
  drm/omap: handle alloc failures in omap_connector
  drm/omap: check return value from soc_device_match
  drm/omap: fix possible NULL ref issue in tiler_reserve_2d
  drm/omap: fix uninitialized ret variable
  drm/omap: silence unititialized variable warning
  mac80211: Adjust SAE authentication timeout
  tee: check shm references are consistent in offset/size
  sh: fix build failure for J2 cpu with SMP disabled
  sched/core: Introduce set_special_state()
  spi: bcm2835aux: ensure interrupts are enabled for shared handler
  RDMA/cma: Do not query GID during QP state transition to RTR
  IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
  IB/hfi1 Use correct type for num_user_context
  smc: fix sendpage() call
  ARM: OMAP1: ams-delta: fix deferred_fiq handler
  nvme: Set integrity flag for user passthrough commands
  nvme: fix potential memory leak in option parsing
  iommu/vt-d: fix shift-out-of-bounds in bug checking
  arm64: tegra: Make BCM89610 PHY interrupt as active low
  kthread, sched/wait: Fix kthread_parkme() wait-loop
  stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
  parisc: drivers.c: Fix section mismatches
  bpf, x64: fix memleak when not converging after image
  scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
  hexagon: export csum_partial_copy_nocheck
  hexagon: add memset_io() helper
  Input: atmel_mxt_ts - fix the firmware update
  ARM: dts: logicpd-som-lv: Fix Audio Mute
  ARM: dts: logicpd-som-lv: Fix WL127x Startup Issues
  ARM: OMAP2+: powerdomain: use raw_smp_processor_id() for trace
  dt-bindings: panel: lvds: Fix path to display timing bindings
  ARM: davinci: board-dm355-evm: fix broken networking
  ARM: davinci: board-omapl138-hawk: fix GPIO numbers for MMC/SD lookup
  ARM: davinci: board-da850-evm: fix GPIO lookup for MMC/SD
  ARM: davinci: board-da830-evm: fix GPIO lookup for MMC/SD
  IB/core: Make ib_mad_client_id atomic
  <linux/stringhash.h>: fix end_name_hash() for 64bit long
  IB/rxe: avoid double kfree_skb
  IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV
  RDMA/iwpm: fix memory leak on map_info
  RDMA/cma: Fix use after destroy access to net namespace for IPoIB
  IB/uverbs: Fix validating mandatory attributes
  IB: make INFINIBAND_ADDR_TRANS configurable
  ib_srp: depend on INFINIBAND_ADDR_TRANS
  ib_srpt: depend on INFINIBAND_ADDR_TRANS
  nvmet-rdma: depend on INFINIBAND_ADDR_TRANS
  nvme: depend on INFINIBAND_ADDR_TRANS
  tipc: fix bug in function tipc_nl_node_dump_monitor
  i2c: sprd: Fix the i2c count issue
  i2c: sprd: Prevent i2c accesses after suspend is called
  bpf: fix uninitialized variable in bpf tools
  x86/cpu/intel: Add missing TLB cpuid values
  ata: ahci: mvebu: override ahci_stop_engine for mvebu AHCI
  libahci: Allow drivers to override stop_engine
  KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_mmio_read_apr()
  arm64: fix possible spectre-v1 in ptrace_hbp_get_event()
  blk-mq: fix sysfs inflight counter
  HID: intel-ish-hid: use put_device() instead of kfree()
  rpmsg: added MODULE_ALIAS for rpmsg_char
  remoteproc: qcom: Fix potential device node leaks
  perf/x86/intel: Don't enable freeze-on-smi for PerfMon V1
  rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp
  selftests: ftrace: Add a testcase for multiple actions on trigger
  HID: wacom: Release device resource data obtained by devres_alloc()
  HID: lenovo: Add support for IBM/Lenovo Scrollpoint mice
  arm64: ptrace: remove addr_limit manipulation
  net: ethtool: Add missing kernel doc for FEC parameters
  thermal: int3403_thermal: Fix NULL pointer deref on module load / probe
  drm/amdkfd: fix clock counter retrieval for node without GPU
  ACPI / watchdog: Prefer iTCO_wdt on Lenovo Z50-70
  ARM: dts: da850: fix W=1 warnings with pinmux node
  net: phy: marvell: clear wol event before setting it
  powerpc/powernv/memtrace: Let the arch hotunplug code flush cache
  dt-bindings: meson-uart: DT fix s/clocks-names/clock-names/
  ACPI / PM: Blacklist Low Power S0 Idle _DSM for ThinkPad X1 Tablet(2016)
  usb: typec: ucsi: fix tracepoint related build error
  mm: memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create()
  kexec_file: do not add extra alignment to efi memmap
  proc: revalidate kernel thread inodes to root:root
  mm, pagemap: fix swap offset value for PMD migration entry
  scsi: isci: Fix infinite loop in while loop
  scsi: storvsc: Set up correct queue depth values for IDE devices
  parisc: time: Convert read_persistent_clock() to read_persistent_clock64()
  vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversion
  net: hns: Avoid action name truncation
  blkcg: init root blkcg_gq under lock
  drm/msm: don't deref error pointer in the msm_fbdev_create error path
  drm/msm/dsi: use correct enum in dsi_get_cmd_fmt
  drm/msm: Fix possible null dereference on failure of get_pages()
  ASoC: msm8916-wcd-analog: use threaded context for mbhc events
  netfilter: nf_tables: fix out-of-bounds in nft_chain_commit_update
  netfilter: nf_tables: NAT chain and extensions require NF_TABLES
  scsi: target: fix crash with iscsi target and dvd
  scsi: megaraid_sas: Do not log an error if FW successfully initializes.
  scsi: iscsi: respond to netlink with unicast when appropriate
  tipc: fix infinite loop when dumping link monitor summary
  blkcg: don't hold blkcg lock when deactivating policy
  spi: cadence: Add usleep_range() for cdns_spi_fill_tx_fifo()
  ASoC: topology: Check widget kcontrols before deref.
  xen: xenbus_dev_frontend: Really return response string
  ASoC: topology: Fix bugs of freeing soc topology
  PCI: kirin: Fix reset gpio name
  soc: bcm2835: Make !RASPBERRYPI_FIRMWARE dummies return failure
  soc: bcm: raspberrypi-power: Fix use of __packed
  eCryptfs: don't pass up plaintext names when using filename encryption
  ASoC: rt5514: Add the missing register in the readable table
  clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
  dt-bindings: dmaengine: rcar-dmac: document R8A77965 support
  dt-bindings: serial: sh-sci: Add support for r8a77965 (H)SCIF
  dt-bindings: pinctrl: sunxi: Fix reference to driver
  doc: Add vendor prefix for Kieback & Peter GmbH
  spi: sh-msiof: Fix bit field overflow writes to TSCR/RSCR
  MIPS: dts: Boston: Fix PCI bus dtc warnings:
  isofs: fix potential memory leak in mount option parsing
  s390/smsgiucv: disable SMSG on module unload
  MIPS: io: Add barrier after register read in readX()
  fsnotify: fix ignore mask logic in send_to_group()
  perf report: Fix switching to another perf.data file
  nfp: ignore signals when communicating with management FW
  MIPS: io: Prevent compiler reordering writeX()
  x86: Add check for APIC access address for vmentry of L2 guests
  KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
  Input: synaptics-rmi4 - fix an unchecked out of memory error path
  clocksource/drivers/imx-tpm: Correct some registers operation flow

  stop_machine: Disable preemption when waking two stopper threads

  When cpu_stop_queue_two_works() begins to wake the stopper
  threads, it does so without preemption disabled, which leads
  to the following race condition:

  The source CPU calls cpu_stop_queue_two_works(), with cpu1
  as the source CPU, and cpu2 as the destination CPU. When
  adding the stopper threads to the wake queue used in this
  function, the source CPU stopper thread is added first,
  and the destination CPU stopper thread is added last.

  When wake_up_q() is invoked to wake the stopper threads, the
  threads are woken up in the order that they are queued in,
  so the source CPU's stopper thread is woken up first, and
  it preempts the thread running on the source CPU.

  The stopper thread will then execute on the source CPU,
  disable preemption, and begin executing multi_cpu_stop()
  and wait for an ack from the destination CPU's stopper thread,
  with preemption still disabled. Since the worker thread that
  woke up the stopper thread on the source CPU is affine to the
  source CPU, and preemption is disabled on the source CPU, that
  thread will never run to dequeue the destination CPU's stopper
  thread from the wake queue, and thus, the destination CPU's
  stopper thread will never run, causing the source CPU's stopper
  thread to wait forever, and stall.

  Disable preemption when waking the stopper threads in
  cpu_stop_queue_two_works() to ensure that the worker thread
  that is waking up the stopper threads isn't preempted
  by the source CPU's stopper thread, and permanently
  scheduled out, leaving the remaining stopper thread asleep
  in the wake queue.

Conflicts:
	drivers/gpu/drm/msm/msm_gem.c
	include/linux/sched.h
	kernel/kthread.c

Change-Id: I177cb8516cdfe50d61cb948ed342d330e61376a1
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-06-28 09:30:40 -07:00
Peter Zijlstra
e7a65e899d stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
[ Upstream commit 0b26351b910fb8fe6a056f8a1bbccabe50c0e19f ]

Matt reported the following deadlock:

CPU0					CPU1

schedule(.prev=migrate/0)		<fault>
  pick_next_task()			  ...
    idle_balance()			    migrate_swap()
      active_balance()			      stop_two_cpus()
						spin_lock(stopper0->lock)
						spin_lock(stopper1->lock)
						ttwu(migrate/0)
						  smp_cond_load_acquire() -- waits for schedule()
        stop_one_cpu(1)
	  spin_lock(stopper1->lock) -- waits for stopper lock

Fix this deadlock by taking the wakeups out from under stopper->lock.
This allows the active_balance() to queue the stop work and finish the
context switch, which in turn allows the wakeup from migrate_swap() to
observe the context and complete the wakeup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180420095005.GH4064@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-21 04:02:53 +09:00
Sebastian Andrzej Siewior
fe5595c074 stop_machine: Provide stop_machine_cpuslocked()
Some call sites of stop_machine() are within a get_online_cpus() protected
region.

stop_machine() calls get_online_cpus() as well, which is possible in the
current implementation but prevents converting the hotplug locking to a
percpu rwsem.

Provide stop_machine_cpuslocked() to avoid nested calls to get_online_cpus().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170524081547.400700852@linutronix.de
2017-05-26 10:10:36 +02:00
Christian Borntraeger
bf0d31c054 locking/core, stop_machine: Yield the CPU during stop machine()
Some time ago the following commit:

  57f2ffe14f ("s390: remove diag 44 calls from cpu_relax()")

... stopped cpu_relax() on s390 yielding to the hypervisor.

As it turns out this made stop_machine() run really slow on virtualized
overcommited systems. For example the kprobes test during bootup took
several seconds instead of just running unnoticed with large guests.

Therefore, yielding was reintroduced with commit:

  4d92f50249 ("s390: reintroduce diag 44 calls for cpu_relax()")

... but in fact the stop machine code seems to be the only place where
this yielding was really necessary. This place is probably the most
important one as it makes all but one guest CPUs wait for one guest CPU.

As we now have cpu_relax_yield(), we can use this in multi_cpu_stop().
For now lets only add it here. We can add it later in other places
when necessary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 10:15:09 +01:00
Linus Torvalds
af79ad2b1f Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes are:

   - irqtime accounting cleanups and enhancements. (Frederic Weisbecker)

   - schedstat debugging enhancements, make it more broadly runtime
     available. (Josh Poimboeuf)

   - More work on asymmetric topology/capacity scheduling. (Morten
     Rasmussen)

   - sched/wait fixes and cleanups. (Oleg Nesterov)

   - PELT (per entity load tracking) improvements. (Peter Zijlstra)

   - Rewrite and enhance select_idle_siblings(). (Peter Zijlstra)

   - sched/numa enhancements/fixes (Rik van Riel)

   - sched/cputime scalability improvements (Stanislaw Gruszka)

   - Load calculation arithmetics fixes. (Dietmar Eggemann)

   - sched/deadline enhancements (Tommaso Cucinotta)

   - Fix utilization accounting when switching to the SCHED_NORMAL
     policy. (Vincent Guittot)

   - ... plus misc cleanups and enhancements"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  sched/irqtime: Consolidate irqtime flushing code
  sched/irqtime: Consolidate accounting synchronization with u64_stats API
  u64_stats: Introduce IRQs disabled helpers
  sched/irqtime: Remove needless IRQs disablement on kcpustat update
  sched/irqtime: No need for preempt-safe accessors
  sched/fair: Fix min_vruntime tracking
  sched/debug: Add SCHED_WARN_ON()
  sched/core: Fix set_user_nice()
  sched/fair: Introduce set_curr_task() helper
  sched/core, ia64: Rename set_curr_task()
  sched/core: Fix incorrect utilization accounting when switching to fair class
  sched/core: Optimize SCHED_SMT
  sched/core: Rewrite and improve select_idle_siblings()
  sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared
  sched/core: Introduce 'struct sched_domain_shared'
  sched/core: Restructure destroy_sched_domain()
  sched/core: Remove unused @cpu argument from destroy_sched_domain*()
  sched/wait: Introduce init_wait_entry()
  sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()
  sched/wait: Avoid abort_exclusive_wait() in ___wait_event()
  ...
2016-10-03 13:39:00 -07:00
Oleg Nesterov
e625397041 stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
stop_two_cpus() and stop_cpus() use stop_cpus_lock to avoid the deadlock,
we need to ensure that the stopper functions can't be queued "backwards"
from one another. This doesn't look nice; if we use lglock then we do not
really need stopper->lock, cpu_stop_queue_work() could use lg_local_lock()
under local_irq_save().

OTOH it would be even better to avoid lglock in stop_machine.c and remove
lg_double_lock(). This patch adds "bool stop_cpus_in_progress" set/cleared
by queue_stop_cpus_work(), and changes cpu_stop_queue_two_works() to busy
wait until it is cleared.

queue_stop_cpus_work() sets stop_cpus_in_progress = T lockless, but after
it queues a work on CPU1 it must be visible to stop_two_cpus(CPU1, CPU2)
which checks it under the same lock. And since stop_two_cpus() holds the
2nd lock too, queue_stop_cpus_work() can not clear stop_cpus_in_progress
if it is also going to queue a work on CPU2, it needs to take that 2nd
lock to do this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151121181148.GA433@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 15:25:55 +02:00
Cheng Chao
bf89a30472 stop_machine: Avoid a sleep and wakeup in stop_one_cpu()
In case @cpu == smp_proccessor_id(), we can avoid a sleep+wakeup
cycle by doing a preemption.

Callers such as sched_exec() can benefit from this change.

Signed-off-by: Cheng Chao <cs.os.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: chris@chris-wilson.co.uk
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1473818510-6779-1-git-send-email-cs.os.kernel@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-22 14:53:45 +02:00
Oleg Nesterov
ce4f06dcbb stop_machine: Touch_nmi_watchdog() after MULTI_STOP_PREPARE
Suppose that stop_machine(fn) hangs because fn() hangs. In this case NMI
hard-lockup can be triggered on another CPU which does nothing wrong and
the trace from nmi_panic() won't help to investigate the problem.

And this change "fixes" the problem we (seem to) hit in practice.

 - stop_two_cpus(0, 1) races with show_state_filter() running on CPU_0.

 - CPU_1 already spins in MULTI_STOP_PREPARE state, it detects the soft
   lockup and tries to report the problem.

 - show_state_filter() enables preemption, CPU_0 calls multi_cpu_stop()
   which goes to MULTI_STOP_DISABLE_IRQ state and disables interrupts.

 - CPU_1 spends more than 10 seconds trying to flush the log buffer to
   the slow serial console.

 - NMI interrupt on CPU_0 (which now waits for CPU_1) calls nmi_panic().

Reported-by: Wang Shu <shuwang@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20160726185736.GB4088@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-27 11:12:11 +02:00
Andrew Morton
b493c34309 kernel/stop_machine.c: remove CONFIG_SMP dependencies
stop_machine.o is only built if CONFIG_SMP=y, so this ifdef always
evaluates to true.

[akpm@linux-foundation.org: remove now-unneeded ifdef]
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-16 11:17:24 -08:00
Ingo Molnar
567bee2803 Merge branch 'sched/urgent' into sched/core, to pick up fixes before merging new patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:02:29 +01:00
Chris Wilson
86fffe4a61 kernel: remove stop_machine() Kconfig dependency
Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable.  This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.

For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Oleg Nesterov
accaf6ea3d stop_machine: Clean up the usage of the preemption counter in cpu_stopper_thread()
1. Change this code to use preempt_count_inc/preempt_count_dec; this way
   it works even if CONFIG_PREEMPT_COUNT=n, and we avoid the unnecessary
   __preempt_schedule() check (stop_sched_class is not preemptible).

   And this makes clear that we only want to make preempt_count() != 0
   for __might_sleep() / schedule_debug().

2. Change WARN_ONCE() to use %pf to print the function name and remove
   kallsyms_lookup/ksym_buf.

3. Move "int ret" into the "if (work)" block, this looks more consistent.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193332.GA8281@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:20 +01:00
Oleg Nesterov
dd2e3121e3 stop_machine: Shift the 'done != NULL' check from cpu_stop_signal_done() to callers
Change cpu_stop_queue_work() and cpu_stopper_thread() to check done != NULL
before cpu_stop_signal_done(done). This makes the code more clean imo, note
that cpu_stopper_thread() has to do this check anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193329.GA8274@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:19 +01:00
Oleg Nesterov
6fa3b826bc stop_machine: Kill cpu_stop_done->executed
Now that cpu_stop_done->executed becomes write-only (ignoring WARN_ON()
checks) we can remove it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193326.GA8269@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:19 +01:00
Oleg Nesterov
4aff1ca697 stop_machine: Change __stop_cpus() to rely on cpu_stop_queue_work()
Change queue_stop_cpus_work() to return true if it queues at least one
work, this means that the caller should wait.

__stop_cpus() can check the value returned by queue_stop_cpus_work() and
avoid done.executed, just like stop_one_cpu() does.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193323.GA8262@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:19 +01:00
Oleg Nesterov
958c5f848e stop_machine: Change stop_one_cpu() to rely on cpu_stop_queue_work()
Change stop_one_cpu() to return -ENOENT if cpu_stop_queue_work() fails.
Otherwise we know that ->executed must be true after wait_for_completion()
so we can just return done.ret.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193320.GA8259@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:18 +01:00
Oleg Nesterov
1b034bd989 stop_machine: Make cpu_stop_queue_work() and stop_one_cpu_nowait() return bool
Change cpu_stop_queue_work() to return true if the work was queued and
change stop_one_cpu_nowait() to return the result of cpu_stop_queue_work().
This makes it more useful, for example now you can alloc cpu_stop_work for
stop_one_cpu_nowait() and free it in the callback or if stop_one_cpu_nowait()
fails, currently this is impossible because you can't know if @fn will be
called or not.

Also, this allows to kill cpu_stop_done->executed, see the next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151117170523.GA13955@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:18 +01:00
Oleg Nesterov
6a19005157 stop_machine: Don't disable preemption in stop_two_cpus()
Now that stop_two_cpus() path does not check cpu_active() we can remove
preempt_disable(), it was only needed to ensure that stop_machine() can
not be called after we observe cpu_active() == T and before we queue the
new work.

Also, turn the pointless and confusing ->executed check into WARN_ON().
We know that both works must be executed, otherwise we have a bug. And
in fact I think that done->executed should die, see the next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193314.GA8249@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:18 +01:00
Oleg Nesterov
64038f292a stop_machine: Fix possible cpu_stopper_thread() crash
stop_one_cpu_nowait(fn) will crash the kernel if the callback returns
nonzero, work->done == NULL in this case.

This needs more cleanups, cpu_stop_signal_done() is called right after
we check done != NULL and it does the same check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Milos Vyletel <milos@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151115193311.GA8242@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:48:17 +01:00
Peter Zijlstra
62694cd513 sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
The cpu_active() tests are not fundamentally part of stop_two_cpus(),
move then into the scheduler where they belong.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:25:56 +02:00
Oleg Nesterov
f0cf16cbd0 stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
Now that we always use stop_machine_unpark() to wake the stopper
threas up, we can kill ->setup() and fold cpu_stop_unpark() into
stop_machine_unpark().

And we do not need stopper->lock to set stopper->enabled = true.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160051.GA10169@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:23:56 +02:00
Oleg Nesterov
c00166d87e stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
1. Change smpboot_unpark_thread() to check ->selfparking, just
   like smpboot_park_thread() does.

2. Introduce stop_machine_unpark() which sets ->enabled and calls
   kthread_unpark().

3. Change smpboot_thread_call() and cpu_stop_init() to call
   stop_machine_unpark() by hand.

This way:

    - IMO the ->selfparking logic becomes more consistent.

    - We can kill the smp_hotplug_thread->pre_unpark() method.

    - We can easily unpark the stopper thread earlier. Say, we
      can move stop_machine_unpark() from smpboot_thread_call()
      to sched_cpu_active() as Peter suggests.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151009160049.GA10166@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:23:55 +02:00
Oleg Nesterov
d8bc853582 stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
Change cpu_stop_queue_two_works() to ensure that both CPU's have
stopper->enabled == T or fail otherwise.

This way stop_two_cpus() no longer needs to check cpu_active() to
avoid the deadlock. This patch doesn't remove these checks, we will
do this later.

Note: we need to take both stopper->lock's at the same time, but this
will also help to remove lglock from stop_machine.c, so I hope this
is fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008170141.GA25537@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:23:55 +02:00
Oleg Nesterov
5caa1c089a stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
Preparation to simplify the review of the next change. Add two simple
helpers, __cpu_stop_queue_work() and cpu_stop_queue_two_works() which
simply take a bit of code from their callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145134.GA18146@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:23:54 +02:00
Oleg Nesterov
233e7f267e stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
cpu_stop_queue_work() checks stopper->enabled before it queues the
work, but ->enabled == T can only guarantee cpu_stop_signal_done()
if we race with cpu_down().

This is not enough for stop_two_cpus() or stop_machine(), they will
deadlock if multi_cpu_stop() won't be called by one of the target
CPU's. stop_machine/stop_cpus are fine, they rely on stop_cpus_mutex.
But stop_two_cpus() has to check cpu_active() to avoid the same race
with hotplug, and this check is very unobvious and probably not even
correct if we race with cpu_up().

Change cpu_down() pass to clear ->enabled before cpu_stopper_thread()
flushes the pending ->works and returns with KTHREAD_SHOULD_PARK set.

Note also that smpboot_thread_call() calls cpu_stop_unpark() which
sets enabled == T at CPU_ONLINE stage, so this CPU can't go away until
cpu_stopper_thread() is called at least once. This all means that if
cpu_stop_queue_work() succeeds, we know that work->fn() will be called.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20151008145131.GA18139@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:23:53 +02:00
Oleg Nesterov
d308b9f1e4 stop_machine: Remove cpu_stop_work's from list in cpu_stop_park()
cpu_stop_park() does cpu_stop_signal_done() but leaves the work on
stopper->works. The owner of this work can free/reuse this memory
right after that and corrupt the list, so if this CPU becomes online
again cpu_stopper_thread() will crash.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012958.GA23944@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:28 +02:00
Oleg Nesterov
9a301f22fa stop_machine: Use 'cpu_stop_fn_t' where possible
Cosmetic, but 'cpu_stop_fn_t' actually makes the code more readable and
it doesn't break cscope. And most of the declarations already use it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012955.GA23937@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:27 +02:00
Oleg Nesterov
7eeb088e72 stop_machine: Unexport __stop_machine()
The only caller outside of stop_machine.c is _cpu_down(), it can use
stop_machine(). get_online_cpus() is fine under cpu_hotplug_begin().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012951.GA23934@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:26 +02:00
Oleg Nesterov
b377c2a089 stop_machine: Don't do for_each_cpu() twice in queue_stop_cpus_work()
queue_stop_cpus_work() can do everything in one for_each_cpu() loop.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012948.GA23927@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:26 +02:00
Oleg Nesterov
02cb7aa923 stop_machine: Move 'cpu_stopper_task' and 'stop_cpus_work' into 'struct cpu_stopper'
Multpiple DEFINE_PER_CPU's do not make sense, move all the per-cpu
variables into 'struct cpu_stopper'.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: viro@ZenIV.linux.org.uk
Link: http://lkml.kernel.org/r/20150630012944.GA23924@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-03 12:21:25 +02:00
Peter Zijlstra
b17718d02f sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
Jiri reported a machine stuck in multi_cpu_stop() with
migrate_swap_stop() as function and with the following src,dst cpu
pairs: {11,  4} {13, 11} { 4, 13}

                        4       11      13

cpuM: queue(4 ,13)
                        *Ma
cpuN: queue(13,11)
                                *N      Na
                        *M              Mb
cpuO: queue(11, 4)
                        *O      Oa
                                *Nb
                        *Ob

Where *X denotes the cpu running the queueing of cpu-X and X[ab] denotes
the first/second queued work.

You'll observe the top of the workqueue for each cpu: 4,11,13 to be work
from cpus: M, O, N resp. IOW. deadlock.

Do away with the queueing trickery and introduce lg_double_lock() to
lock both CPUs and fully serialize the stop_two_cpus() callers instead
of the partial (and buggy) serialization we have now.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150605153023.GH19282@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-19 10:03:12 +02:00
Fabian Frederick
cf25004069 kernel/stop_machine.c: kernel-doc warning fix
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:15 -07:00
Peter Zijlstra
177c53d943 stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
We must use smp_call_function_single(.wait=1) for the
irq_cpu_stop_queue_work() to ensure the queueing is actually done under
stop_cpus_lock. Without this we could have dropped the lock by the time
we do the queueing and get the race we tried to fix.

Fixes: 7053ea1a34 ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:33:47 +01:00
Rik van Riel
7053ea1a34 stop_machine: Fix race between stop_two_cpus() and stop_cpus()
There is a race between stop_two_cpus, and the global stop_cpus.

It is possible for two CPUs to get their stopper functions queued
"backwards" from one another, resulting in the stopper threads
getting stuck, and the system hanging. This can happen because
queuing up stoppers is not synchronized.

This patch adds synchronization between stop_cpus (a rare operation),
and stop_two_cpus.

Reported-and-Tested-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/20131101104146.03d1e043@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-11 12:43:38 +01:00
Peter Zijlstra
6acce3ef84 sched: Remove get_online_cpus() usage
Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:

 - sched_init_smp(); where its completely superfluous since we're in
   'early' boot and there simply cannot be any hotplugging.

 - sched_getaffinity(); we already take a raw spinlock to protect the
   task cpus_allowed mask, this disables preemption and therefore
   also stabilizes cpu_online_mask as that's modified using
   stop_machine. However switch to active mask for symmetry with
   sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
   mask stability by inserting sync_rcu/sched() into _cpu_down.

 - sched_setaffinity(); we don't appear to need get_online_cpus()
   either, there's two sites where hotplug appears relevant:
    * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
      for the cpuset case we hold task_lock, which is a spinlock and
      thus for mainline disables preemption (might cause pain on RT).
    * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
      preemption properly disabled; also it already deals with hotplug
      races explicitly where it releases them.

 - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
   us with a little trickery. By adding a sync_sched/rcu() after the
   CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
   cpu_active_mask. Use these to validate that both our cpus are active
   when queueing the stop work before we queue the stop_machine works
   for take_cpu_down().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-16 14:22:16 +02:00
Peter Zijlstra
1be0bd77c5 stop_machine: Introduce stop_two_cpus()
Introduce stop_two_cpus() in order to allow controlled swapping of two
tasks. It repurposes the stop_machine() state machine but only stops
the two cpus which we can do with on-stack structures and avoid
machine wide synchronization issues.

The ordering of CPUs is important to avoid deadlocks. If unordered then
two cpus calling stop_two_cpus on each other simultaneously would attempt
to queue in the opposite order on each CPU causing an AB-BA style deadlock.
By always having the lowest number CPU doing the queueing of works, we can
guarantee that works are always queued in the same order, and deadlocks
are avoided.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ Implemented deadlock avoidance. ]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1381141781-10992-38-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:40:45 +02:00
Thomas Gleixner
46c498c2cd stop_machine: Mark per cpu stopper enabled early
commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
following regression:

Before this commit the stopper enabled bit was set in the online
notifier.

CPU0				CPU1
cpu_up
				cpu online
hotplug_notifier(ONLINE)
  stopper(CPU1)->enabled = true;
...
stop_machine()

The conversion to smpboot threads moved the enablement to the wakeup
path of the parked thread. The majority of users seem to have the
following working order:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
				stopper thread runs
				  stopper(CPU1)->enabled = true;
stop_machine()

But Konrad and Sander have observed:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
stop_machine()
				stopper thread runs
				  stopper(CPU1)->enabled = true;

Now the stop machinery kicks CPU0 into the stop loop, where it gets
stuck forever because the queue code saw stopper(CPU1)->enabled ==
false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
stopper work got discarded due to enabled == false.

Add a pre_unpark function to the smpboot thread descriptor and call it
before waking the thread.

This fixes the problem at hand, but the stop_machine code should be
more robust. The stopper->enabled flag smells fishy at best.

Thanks to Konrad for going through a loop of debug patches and
providing the information to decode this issue.

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-26 22:25:17 +01:00
Thomas Gleixner
14e568e78f stop_machine: Use smpboot threads
Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-14 15:29:38 +01:00
Thomas Gleixner
860a0ffaa3 stop_machine: Store task reference in a separate per cpu variable
To allow the stopper thread being managed by the smpboot thread
infrastructure separate out the task storage from the stopper data
structure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.626690384@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-14 15:29:37 +01:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Jeremy Fitzhardinge
f445027e4e stop_machine: make stop_machine safe and efficient to call early
Make stop_machine() safe to call early in boot, before SMP has been set
up, by simply calling the callback function directly if there's only one
CPU online.

[ Fixes from AKPM:
   - add comment
   - local_irq_flags, not save_flags
   - also call hard_irq_disable() for systems which need it

  Tejun suggested using an explicit flag rather than just looking at
  the online cpu count. ]

Cc: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Tejun Heo <htejun@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31 17:30:53 -07:00
Paul Gortmaker
9984de1a5a kernel: Map most files to use export.h instead of module.h
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include <linux/module.h>
  +#include <linux/export.h>

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 09:20:12 -04:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Suresh Siddha
192d885742 x86, mtrr: use stop_machine APIs for doing MTRR rendezvous
MTRR rendezvous sequence is not implemened using stop_machine() before, as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).

Now that we have a new stop_machine_from_inactive_cpu() API, use it for
rendezvous during mtrr init of a logical processor that is coming online.

For the rest (runtime MTRR modification, system boot, resume paths), use
stop_machine() to implement the rendezvous sequence. This will consolidate and
cleanup the code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182057.076997177@sbsiddha-MOBL3.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-27 15:17:13 -07:00