Commit Graph

86 Commits

Author SHA1 Message Date
Prasad Sodagudi
c96a62baf3 Revert "cpuidle: Fix cpu frequent exits from low power mode"
This reverts 'commit ae04811d41 ("cpuidle: Fix cpu frequent
exits from low power mode")'. Fix is identified for LPM modes
regressions. Clock event device is not getting programmed in
hrtimer_cancel as hrtimer_next_event_without() sets
cpu_base.next_timer to NULL unconditionally.

Change-Id: Ic4604da1c7829a49d3a2a54b1426f3024b26f731
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2018-09-18 11:51:58 +05:30
Prasad Sodagudi
ae04811d41 cpuidle: Fix cpu frequent exits from low power mode
Following upstream changes are reverted due cpu low
power mode use cases regressions.

'commit 55e591cc18 ("BACKPORT: time: tick-sched: Reorganize idle tick management code")'
'commit f6d3093dfc ("BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop")'
'commit 27e8616e42 ("UPSTREAM: sched: idle: Do not stop the tick before cpuidle_idle_call()")'
'commit 8b468535df ("UPSTREAM: jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC")'
'commit 3a25735bd7 ("UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()")'
'commit f69cfc8ef9 ("BACKPORT: time: tick-sched: Split tick_nohz_stop_sched_tick()")'
'commit 6277dd586f ("BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()")'
'commit 8c71f69fb4 ("UPSTREAM: sched: idle: Select idle state before stopping the tick")'
'commit 30693a4f09 ("UPSTREAM: cpuidle: menu: Refine idle state selection for running tick")'
'commit e32966ced8 ("UPSTREAM: cpuidle: menu: Avoid selecting shallow states with stopped tick")'

Also fix the compilation errrors as tick_nohz_get_sleep_length()
function signature is changed.

Change-Id: I21488a5a91f1eac11d4e139fd44968d52563717c
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2018-09-14 23:21:12 -07:00
Isaac J. Manjarres
b2c8463039 Merge android-4.14-p.61 (b7e55e8) into msm-4.14
* remotes/origin/tmp-b7e55e8:
  Linux 4.14.61
  scsi: sg: fix minor memory leak in error path
  drm/vc4: Reset ->{x, y}_scaling[1] when dealing with uniplanar formats
  crypto: padlock-aes - Fix Nano workaround data corruption
  RDMA/uverbs: Expand primary and alt AV port checks
  iwlwifi: add more card IDs for 9000 series
  userfaultfd: remove uffd flags from vma->vm_flags if UFFD_EVENT_FORK fails
  audit: fix potential null dereference 'context->module.name'
  kvm: x86: vmx: fix vpid leak
  x86/entry/64: Remove %ebx handling from error_entry/exit
  x86/apic: Future-proof the TSC_DEADLINE quirk for SKX
  virtio_balloon: fix another race between migration and ballooning
  net: socket: fix potential spectre v1 gadget in socketcall
  can: ems_usb: Fix memory leak on ems_usb_disconnect()
  squashfs: more metadata hardenings
  squashfs: more metadata hardening
  net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager
  rxrpc: Fix user call ID check in rxrpc_service_prealloc_one
  net: stmmac: Fix WoL for PCI-based setups
  netlink: Fix spectre v1 gadget in netlink_create()
  net: dsa: Do not suspend/resume closed slave_dev
  ipv4: frags: handle possible skb truesize change
  inet: frag: enforce memory limits earlier
  bonding: avoid lockdep confusion in bond_get_stats()
  Linux 4.14.60
  tcp: add one more quick ack after after ECN events
  tcp: refactor tcp_ecn_check_ce to remove sk type cast
  tcp: do not aggressively quick ack after ECN events
  tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode
  tcp: do not force quickack when receiving out-of-order packets
  netlink: Don't shift with UB on nlk->ngroups
  netlink: Do not subscribe to non-existent groups
  xen-netfront: wait xenbus state change when load module manually
  tcp_bbr: fix bw probing to raise in-flight data for very small BDPs
  NET: stmmac: align DMA stuff to largest cache line length
  net: mdio-mux: bcm-iproc: fix wrong getter and setter pair
  net: lan78xx: fix rx handling before first packet is send
  net: fix amd-xgbe flow-control issue
  net: ena: Fix use of uninitialized DMA address bits field
  ipv4: remove BUG_ON() from fib_compute_spec_dst
  net: dsa: qca8k: Allow overwriting CPU port setting
  net: dsa: qca8k: Add QCA8334 binding documentation
  net: dsa: qca8k: Enable RXMAC when bringing up a port
  net: dsa: qca8k: Force CPU port to its highest bandwidth
  RDMA/uverbs: Protect from attempts to create flows on unsupported QP
  usb: gadget: udc: renesas_usb3: should remove debugfs
  ovl: Sync upper dirty data when syncing overlayfs
  PCI: xgene: Remove leftover pci_scan_child_bus() call
  PCI: pciehp: Assume NoCompl+ for Thunderbolt ports
  ext4: fix check to prevent initializing reserved inodes
  ext4: check for allocation block validity with block group locked
  ext4: fix inline data updates with checksums enabled
  squashfs: be more careful about metadata corruption
  random: mix rdrand with entropy sent in from userspace
  block: reset bi_iter.bi_done after splitting bio
  blkdev: __blkdev_direct_IO_simple: fix leak in error case
  block: bio_iov_iter_get_pages: fix size of last iovec
  drm/dp/mst: Fix off-by-one typo when dump payload table
  drm/atomic-helper: Drop plane->fb references only for drm_atomic_helper_shutdown()
  drm: Add DP PSR2 sink enable bit
  ASoC: topology: Add missing clock gating parameter when parsing hw_configs
  ASoC: topology: Fix bclk and fsync inversion in set_link_hw_format()
  media: si470x: fix __be16 annotations
  media: atomisp: compat32: fix __user annotations
  scsi: cxlflash: Avoid clobbering context control register value
  scsi: cxlflash: Synchronize reset and remove ops
  scsi: megaraid_sas: Increase timeout by 1 sec for non-RAID fastpath IOs
  scsi: scsi_dh: replace too broad "TP9" string with the exact models
  regulator: Don't return or expect -errno from of_map_mode()
  media: omap3isp: fix unbalanced dma_iommu_mapping
  crypto: authenc - don't leak pointers to authenc keys
  crypto: authencesn - don't leak pointers to authenc keys
  usb: hub: Don't wait for connect state at resume for powered-off ports
  microblaze: Fix simpleImage format generation
  soc: imx: gpcv2: Do not pass static memory as platform data
  serial: core: Make sure compiler barfs for 16-byte earlycon names
  staging: lustre: ldlm: free resource when ldlm_lock_create() fails.
  staging: lustre: llite: correct removexattr detection
  staging: vchiq_core: Fix missing semaphore release in error case
  audit: allow not equal op for audit by executable
  rsi: fix nommu_map_sg overflow kernel panic
  rsi: Fix 'invalid vdd' warning in mmc
  ipconfig: Correctly initialise ic_nameservers
  drm/gma500: fix psb_intel_lvds_mode_valid()'s return type
  igb: Fix queue selection on MAC filters on i210
  arm64: defconfig: Enable Rockchip io-domain driver
  nvme: lightnvm: add granby support
  memory: tegra: Apply interrupts mask per SoC
  memory: tegra: Do not handle spurious interrupts
  delayacct: Use raw_spinlocks
  stop_machine: Use raw spinlocks
  backlight: pwm_bl: Don't use GPIOF_* with gpiod_get_direction
  dt-bindings: net: meson-dwmac: new compatible name for AXG SoC
  net: hns3: Fixes the out of bounds access in hclge_map_tqp
  spi: meson-spicc: Fix error handling in meson_spicc_probe()
  dt-bindings: pinctrl: meson: add support for the Meson8m2 SoC
  mmc: pwrseq: Use kmalloc_array instead of stack VLA
  mmc: dw_mmc: update actual clock for mmc debugfs
  ALSA: hda/ca0132: fix build failure when a local macro is defined
  drm/atomic: Handling the case when setting old crtc for plane
  media: siano: get rid of __le32/__le16 cast warnings
  f2fs: avoid fsync() failure caused by EAGAIN in writepage()
  bpf: fix references to free_bpf_prog_info() in comments
  thermal: exynos: fix setting rising_threshold for Exynos5433
  staging: lustre: o2iblnd: Fix FastReg map/unmap for MLX5
  staging: lustre: o2iblnd: fix race at kiblnd_connect_peer
  scsi: qedf: Set the UNLOADING flag when removing a vport
  scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw
  scsi: megaraid: silence a static checker bug
  scsi: 3w-xxxx: fix a missing-check bug
  scsi: 3w-9xxx: fix a missing-check bug
  bnxt_en: Check unsupported speeds in bnxt_update_link() on PF only.
  perf: fix invalid bit in diagnostic entry
  s390/cpum_sf: Add data entry sizes to sampling trailer entry
  brcmfmac: Add support for bcm43364 wireless chipset
  mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages
  media: saa7164: Fix driver name in debug output
  media: media-device: fix ioctl function types
  ACPI / LPSS: Only call pwm_add_table() for Bay Trail PWM if PMIC HRV is 2
  libata: Fix command retry decision
  media: rcar_jpu: Add missing clk_disable_unprepare() on error in jpu_open()
  net: phy: phylink: Release link GPIO
  dma-iommu: Fix compilation when !CONFIG_IOMMU_DMA
  tty: Fix data race in tty_insert_flip_string_fixed_flag
  i40e: free the skb after clearing the bitlock
  nvmem: properly handle returned value nvmem_reg_read
  ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node
  ARM: dts: emev2: Add missing interrupt-affinity to PMU node
  ARM: dts: stih407-pinctrl: Fix complain about IRQ_TYPE_NONE usage
  EDAC, altera: Fix ARM64 build warning
  HID: i2c-hid: check if device is there before really probing
  powerpc/embedded6xx/hlwd-pic: Prevent interrupts from being handled by Starlet
  drm/amdgpu: Remove VRAM from shared bo domains.
  drm/radeon: fix mode_valid's return type
  arm64: dts: renesas: salvator-common: use audio-graph-card for Sound
  HID: hid-plantronics: Re-resend Update to map button for PTT products
  arm64: cmpwait: Clear event register before arming exclusive monitor
  media: atomisp: ov2680: don't declare unused vars
  ALSA: usb-audio: Apply rate limit to warning messages in URB complete callback
  net: ethernet: ti: cpsw-phy-sel: check bus_find_device() ret value
  media: smiapp: fix timeout checking in smiapp_read_nvm
  ixgbevf: fix MAC address changes through ixgbevf_set_mac()
  md: fix NULL dereference of mddev->pers in remove_and_add_spares()
  md/raid1: add error handling of read error from FailFast device
  regulator: pfuze100: add .is_enable() for pfuze100_swb_regulator_ops
  ALSA: emu10k1: Rate-limit error messages about page errors
  rtc: tps65910: fix possible race condition
  rtc: vr41xx: fix possible race condition
  rtc: tps6586x: fix possible race condition
  Bluetooth: btusb: add ID for LiteOn 04ca:301a
  drm/nouveau/fifo/gk104-: poll for runlist update completion
  scsi: zfcp: assert that the ERP lock is held when tracing a recovery trigger
  scsi: ufs: fix exception event handling
  scsi: ufs: ufshcd: fix possible unclocked register access
  fscrypt: use unbound workqueue for decryption
  net: hns3: Fix the missing client list node initialization
  spi: Add missing pm_runtime_put_noidle() after failed get
  drivers/perf: arm-ccn: don't log to dmesg in event_init
  ima: based on policy verify firmware signatures (pre-allocated buffer)
  mwifiex: correct histogram data with appropriate index
  net: dsa: qca8k: Add support for QCA8334 switch
  PCI: pciehp: Request control of native hotplug only if supported
  bpf: powerpc64: pad function address loads with NOPs
  pinctrl: at91-pio4: add missing of_node_put
  powerpc/8xx: fix invalid register expression in head_8xx.S
  spi: sh-msiof: Fix setting SIRMDR1.SYNCAC to match SITMDR1.SYNCAC
  powerpc: Add __printf verification to prom_printf
  powerpc/powermac: Mark variable x as unused
  powerpc/powermac: Add missing prototype for note_bootable_part()
  powerpc/chrp/time: Make some functions static, add missing header include
  powerpc/32: Add a missing include header
  ath: Add regulatory mapping for Bahamas
  ath: Add regulatory mapping for Bermuda
  ath: Add regulatory mapping for Serbia
  ath: Add regulatory mapping for Tanzania
  ath: Add regulatory mapping for Uganda
  ath: Add regulatory mapping for APL2_FCCA
  ath: Add regulatory mapping for APL13_WORLD
  ath: Add regulatory mapping for ETSI8_WORLD
  ath: Add regulatory mapping for FCC3_ETSIC
  nvme-pci: Fix AER reset handling
  nvme-rdma: stop admin queue before freeing it
  PCI: Prevent sysfs disable of device while driver is attached
  PM / wakeup: Make s2idle_lock a RAW_SPINLOCK
  x86/microcode: Make the late update update_lock a raw lock for RT
  btrfs: qgroup: Finish rescan when hit the last leaf of extent tree
  btrfs: add barriers to btrfs_sync_log before log_commit_wait wakeups
  Btrfs: don't BUG_ON() in btrfs_truncate_inode_items()
  Btrfs: don't return ino to ino cache if inode item removal fails
  media: videobuf2-core: don't call memop 'finish' when queueing
  media: tw686x: Fix incorrect vb2_mem_ops GFP flags
  net: hns3: Fixes the init of the VALID BD info in the descriptor
  wlcore: sdio: check for valid platform device data before suspend
  mwifiex: handle race during mwifiex_usb_disconnect
  mfd: cros_ec: Fail early if we cannot identify the EC
  ASoC: dpcm: fix BE dai not hw_free and shutdown
  Bluetooth: btusb: Add a new Realtek 8723DE ID 2ff8:b011
  Bluetooth: hci_qca: Fix "Sleep inside atomic section" warning
  iwlwifi: pcie: fix race in Rx buffer allocator
  btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
  PCI: Fix devm_pci_alloc_host_bridge() memory leak
  selftests: intel_pstate: return Kselftest Skip code for skipped tests
  selftests: memfd: return Kselftest Skip code for skipped tests
  selftests/intel_pstate: Improve test, minor fixes
  perf/x86/intel/uncore: Correct fixed counter index check for NHM
  perf/x86/intel/uncore: Correct fixed counter index check in generic code
  usbip: dynamically allocate idev by nports found in sysfs
  usbip: usbip_detach: Fix memory, udev context and udev leak
  block, bfq: remove wrong lock in bfq_requests_merged
  f2fs: fix race in between GC and atomic open
  f2fs: fix to detect failure of dquot_initialize
  f2fs: Fix deadlock in shutdown ioctl
  f2fs: fix to wait page writeback during revoking atomic write
  f2fs: fix to don't trigger writeback during recovery
  f2fs: fix error path of move_data_page
  disable loading f2fs module on PAGE_SIZE > 4KB
  pnfs: Don't release the sequence slot until we've processed layoutget on open
  netfilter: nf_tables: check msg_type before nft_trans_set(trans)
  lightnvm: pblk: warn in case of corrupted write buffer
  RDMA/mad: Convert BUG_ONs to error flows
  powerpc/64s: Fix compiler store ordering to SLB shadow area
  hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common()
  powerpc/eeh: Fix use-after-release of EEH driver
  powerpc/64s: Add barrier_nospec
  powerpc/lib: Adjust .balign inside string functions for PPC32
  infiniband: fix a possible use-after-free bug
  e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
  ceph: fix alignment of rasize
  bpf, arm32: fix inconsistent naming about emit_a32_lsr_{r64,i64}
  printk: drop in_nmi check from printk_safe_flush_on_panic()
  watchdog: da9063: Fix updating timeout value
  irqchip/ls-scfg-msi: Map MSIs in the iommu
  netfilter: ipset: List timing out entries with "timeout 1" instead of zero
  netfilter: ipset: forbid family for hash:mac sets
  perf tools: Fix pmu events parsing rule
  rtc: ensure rtc_set_alarm fails when alarms are not supported
  mm/slub.c: add __printf verification to slab_err()
  mm: vmalloc: avoid racy handling of debugobjects in vunmap
  mm: /proc/pid/pagemap: hide swap entries from unprivileged users
  kernel/hung_task.c: show all hung tasks before panic
  vfio/type1: Fix task tracking for QEMU vCPU hotplug
  vfio/mdev: Check globally for duplicate devices
  vfio: platform: Fix reset module leak in error path
  nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
  NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
  ALSA: fm801: add error handling for snd_ctl_add
  ALSA: emu10k1: add error handling for snd_ctl_add
  skip LAYOUTRETURN if layout is invalid
  hv_netvsc: fix network namespace issues with VF support
  xen/netfront: raise max number of slots in xennet_get_responses()
  kcov: ensure irq code sees a valid area
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
  tracing: Quiet gcc warning about maybe unused link variable
  tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
  kthread, tracing: Don't expose half-written comm when creating kthreads
  tracing: Fix possible double free in event_enable_trigger_func()
  tracing: Fix double free of event_trigger_data
  delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
  kvm, mm: account shadow page tables to kmemcg
  Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
  Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
  Input: elan_i2c - add ACPI ID for lenovo ideapad 330
  spi: spi-s3c64xx: Fix system resume support
  drivers/infiniband/ulp/srpt/ib_srpt.c: fix build with gcc-4.4.4
  IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()
  drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4
  RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access
  i2c: core: decrease reference count of device node in i2c_unregister_device
  fork: unconditionally clear stack on fork
  Linux 4.14.59
  turn off -Wattribute-alias
  can: m_can.c: fix setup of CCCR register: clear CCCR NISO bit before checking can.ctrlmode
  can: peak_canfd: fix firmware < v3.3.0: limit allocation to 32-bit DMA addr only
  can: xilinx_can: fix RX overflow interrupt not being enabled
  can: xilinx_can: fix incorrect clear of non-processed interrupts
  can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
  can: xilinx_can: fix device dropping off bus on RX overrun
  can: xilinx_can: fix recovery from error states not being propagated
  can: xilinx_can: fix power management handling
  can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
  driver core: Partially revert "driver core: correct device's shutdown order"
  usb: gadget: f_fs: Only return delayed status when len is 0
  usb: dwc2: Fix DMA alignment to start at allocated boundary
  usb: core: handle hub C_PORT_OVER_CURRENT condition
  usb: cdc_acm: Add quirk for Castles VEGA3000
  staging: speakup: fix wraparound in uaccess length check
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  tcp: do not delay ACK in DCTCP upon CE status change
  tcp: do not cancel delay-AcK on DCTCP special ACK
  tcp: helpers to send special DCTCP ack
  tcp: fix dctcp delayed ACK schedule
  vxlan: fix default fdb entry netlink notify ordering during netdev create
  vxlan: make netlink notify in vxlan_fdb_destroy optional
  vxlan: add new fdb alloc and create helpers
  rtnetlink: add rtnl_link_state check in rtnl_configure_link
  sock: fix sg page frag coalescing in sk_alloc_sg
  net: phy: consider PHY_IGNORE_INTERRUPT in phy_start_aneg_priv
  multicast: do not restore deleted record source filter mode to new one
  net/ipv6: Fix linklocal to global address with VRF
  net/mlx5e: Fix quota counting in aRFS expire flow
  net/mlx5e: Don't allow aRFS for encapsulated packets
  net/mlx5: Adjust clock overflow work period
  net: skb_segment() should not return NULL
  net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
  ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
  ip: hash fragments consistently
  bonding: set default miimon value for non-arp modes if not set
  drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
  drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  xen/PVH: Set up GS segment for stack canary
  MIPS: Fix off-by-one in pci_resource_to_user()
  MIPS: ath79: fix register address in ath79_ddr_wb_flush()
  Revert "cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting"
  ANDROID: verity: really fix android-verity Kconfig
  tcp: add tcp_ooo_try_coalesce() helper
  tcp: call tcp_drop() from tcp_data_queue_ofo()
  tcp: detect malicious patterns in tcp_collapse_ofo_queue()
  tcp: avoid collapses in tcp_prune_queue() if possible
  tcp: free batches of packets in tcp_prune_ofo_queue()
  x86_64_cuttlefish_defconfig: Enable android-verity
  x86_64_cuttlefish_defconfig: enable verity cert
  ANDROID: android-verity: Fix broken parameter handling.
  ANDROID: android-verity: Make it work with newer kernels
  ANDROID: android-verity: Add API to verify signature with builtin keys.
  ANDROID: verity: fix android-verity Kconfig dependencies
  Linux 4.14.58
  xhci: Fix perceived dead host due to runtime suspend race with event handler
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  cxl_getfile(): fix double-iput() on alloc_file() failures
  alpha: fix osf_wait4() breakage
  net: usb: asix: replace mii_nway_restart in resume path
  ipv6: make DAD fail with enhanced DAD when nonce length differs
  net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite
  net/mlx4_en: Don't reuse RX page when XDP is set
  hv_netvsc: Fix napi reschedule while receive completion is busy
  tg3: Add higher cpu clock for 5762.
  qmi_wwan: add support for Quectel EG91
  ptp: fix missing break in switch
  net: phy: fix flag masking in __set_phy_supported
  net/ipv4: Set oif in fib_compute_spec_dst
  skbuff: Unconditionally copy pfmemalloc in __skb_clone()
  net: Don't copy pfmemalloc flag in __copy_skb_header()
  net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
  lib/rhashtable: consider param->min_size when setting initial table size
  ipv6: ila: select CONFIG_DST_CACHE
  ipv6: fix useless rol32 call on hash
  ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
  gen_stats: Fix netlink stats dumping in the presence of padding
  drm/nouveau: Avoid looping through fake MST connectors
  drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
  drm/i915: Fix hotplug irq ack on i965/g4x
  stop_machine: Disable preemption when waking two stopper threads
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  vfio/pci: Fix potential Spectre v1
  cpufreq: intel_pstate: Register when ACPI PCCH is present
  mm/huge_memory.c: fix data loss when splitting a file pmd
  mm: memcg: fix use after free in mem_cgroup_iter()
  ARC: mm: allow mprotect to make stack mappings executable
  ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
  ARC: Fix CONFIG_SWAP
  ARCv2: [plat-hsdk]: Save accl reg pair by default
  ALSA: hda: add mute led support for HP ProBook 455 G5
  ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
  ALSA: rawmidi: Change resized buffers atomically
  fat: fix memory allocation failure handling of match_strdup()
  x86/MCE: Remove min interval polling limitation
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment
  x86/apm: Don't access __preempt_count with zeroed fs
  KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
  scsi: sd_zbc: Fix variable type and bogus comment
  ANDROID: uid_sys_stats: Replace tasklist lock with RCU in uid_cputime_show
  Linux 4.14.57
  string: drop __must_check from strscpy() and restore strscpy() usages in cgroup
  arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
  arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
  arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
  arm64: KVM: Add HYP per-cpu accessors
  arm64: ssbd: Add prctl interface for per-thread mitigation
  arm64: ssbd: Introduce thread flag to control userspace mitigation
  arm64: ssbd: Restore mitigation status on CPU resume
  arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
  arm64: ssbd: Add global mitigation state accessor
  arm64: Add 'ssbd' command-line option
  arm64: Add ARCH_WORKAROUND_2 probing
  arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
  arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
  arm/arm64: smccc: Add SMCCC-specific return codes
  KVM: arm64: Avoid storing the vcpu pointer on the stack
  KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state
  arm64: alternatives: Add dynamic patching feature
  KVM: arm64: Stop save/restoring host tpidr_el1 on VHE
  arm64: alternatives: use tpidr_el2 on VHE hosts
  KVM: arm64: Change hyp_panic()s dependency on tpidr_el2
  KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation
  KVM: arm64: Store vcpu on the stack during __guest_enter()
  net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
  rds: avoid unenecessary cong_update in loop transport
  bdi: Fix another oops in wb_workfn()
  netfilter: ipv6: nf_defrag: drop skb dst before queueing
  nsh: set mac len based on inner packet
  autofs: fix slab out of bounds read in getname_kernel()
  tls: Stricter error checking in zerocopy sendmsg path
  KEYS: DNS: fix parsing multiple options
  reiserfs: fix buffer overflow with long warning messages
  netfilter: ebtables: reject non-bridge targets
  PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
  block: do not use interruptible wait anywhere
  mtd: rawnand: denali_dt: set clk_x_rate to 200 MHz unconditionally
  crypto: af_alg - Initialize sg_num_bytes in error code path
  clocksource: Initialize cs->wd_list
  media: rc: oops in ir_timer_keyup after device unplug
  xhci: Fix USB3 NULL pointer dereference at logical disconnect.
  net: lan78xx: Fix race in tx pending skb size calculation
  rtlwifi: rtl8821ae: fix firmware is not ready to run
  rtlwifi: Fix kernel Oops "Fw download fail!!"
  net: cxgb3_main: fix potential Spectre v1
  VSOCK: fix loopback on big-endian systems
  vhost_net: validate sock before trying to put its fd
  tcp: prevent bogus FRTO undos with non-SACK flows
  tcp: fix Fast Open key endianness
  strparser: Remove early eaten to fix full tcp receive buffer stall
  stmmac: fix DMA channel hang in half-duplex mode
  r8152: napi hangup fix after disconnect
  qmi_wwan: add support for the Dell Wireless 5821e module
  qed: Limit msix vectors in kdump kernel to the minimum required count.
  qed: Fix use of incorrect size in memcpy call.
  qed: Fix setting of incorrect eswitch mode.
  qede: Adverstise software timestamp caps when PHC is not available.
  net/tcp: Fix socket lookups with SO_BINDTODEVICE
  net: sungem: fix rx checksum support
  net_sched: blackhole: tell upper qdisc about dropped packets
  net/packet: fix use-after-free
  net: mvneta: fix the Rx desc DMA address in the Rx path
  net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
  net/mlx5: Fix required capability for manipulating MPFS
  net/mlx5: Fix incorrect raw command length parsing
  net/mlx5: Fix command interface race in polling mode
  net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
  net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
  net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
  net: macb: Fix ptp time adjustment for large negative delta
  net: fix use-after-free in GRO with ESP
  net: dccp: switch rx_tstamp_last_feedback to monotonic clock
  net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
  ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
  ipvlan: fix IFLA_MTU ignored on NEWLINK
  ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
  hv_netvsc: split sub-channel setup into async and sync
  atm: zatm: Fix potential Spectre v1
  atm: Preserve value of skb->truesize when accounting to vcc
  alx: take rtnl before calling __alx_open from resume
  crypto: crypto4xx - fix crypto4xx_build_pdr, crypto4xx_build_sdr leak
  crypto: crypto4xx - remove bad list_del
  PCI: exynos: Fix a potential init_clk_resources NULL pointer dereference
  bcm63xx_enet: do not write to random DMA channel on BCM6345
  bcm63xx_enet: correct clock usage
  ocfs2: ip_alloc_sem should be taken in ocfs2_get_block()
  ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent
  xprtrdma: Fix corner cases when handling device removal
  cpufreq / CPPC: Set platform specific transition_delay_us
  Btrfs: fix duplicate extents after fsync of file with prealloc extents
  x86/paravirt: Make native_save_fl() extern inline
  x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
  compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
  ANDROID: Add hold functionality to schedtune CPU boost
  ANDROID: sched/rt: Add schedtune accounting to rt task enqueue/dequeue
  UPSTREAM: cpuidle: menu: Avoid selecting shallow states with stopped tick
  UPSTREAM: cpuidle: menu: Refine idle state selection for running tick
  UPSTREAM: sched: idle: Select idle state before stopping the tick
  BACKPORT: time: hrtimer: Introduce hrtimer_next_event_without()
  BACKPORT: time: tick-sched: Split tick_nohz_stop_sched_tick()
  UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()
  UPSTREAM: jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  UPSTREAM: sched: idle: Do not stop the tick before cpuidle_idle_call()
  BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop
  BACKPORT: time: tick-sched: Reorganize idle tick management code
  ANDROID: sched/fair: fix a warning
  ANDROID: sched/walt: Fix compilation issue for x86_64
  ANDROID: mnt: Fix next_descendent
  ANDROID: sched/events: Introduce util_est trace events
  ANDROID: sched/fair: schedtune: update before schedutil
  FROMLIST: sched/fair: add support to tune PELT ramp/decay timings
  BACKPORT: sched/fair: Update util_est before updating schedutil
  BACKPORT: sched/fair: Update util_est only on util_avg updates
  BACKPORT: sched/fair: Use util_est in LB and WU paths
  BACKPORT: sched/fair: Add util_est on top of PELT
  ANDROID: sched/fair: Cleanup cpu_util{_wake}()
  ANDROID: sched: Update max cpu capacity in case of max frequency constraints
  ANDROID: arm: enable max frequency capping
  ANDROID: arm64: enable max frequency capping
  ANDROID: implement max frequency capping
  ANDROID: sched/fair: add arch scaling function for max frequency capping
  ANDROID: trace: Add WALT util signal to trace event sched_load_cfs_rq
  ANDROID: sched, trace: Remove trace event sched_load_avg_cpu
  ANDROID: Rename and move include/linux/sched_energy.h
  ANDROID: Adjust juno energy model
  ANDROID: Check equality of max cap state cap and cpu scale
  ANDROID: Move energy model init call into arch_topology driver
  ANDROID: Streamline sched_domain_energy_f functions
  ANDROID: Separate cpu_scale and energy model setup
  ANDROID: update_group_capacity for single cpu in cluster
  ANDROID: sched/fair: return idle CPU immediately for prefer_idle
  ANDROID: sched/fair: add idle state filter to prefer_idle case
  ANDROID: sched/fair: remove order from CPU selection
  ANDROID: sched/fair: unify spare capacity calculation
  ANDROID:sched/fair: prefer energy efficient CPUs for !prefer_idle tasks
  ANDROID: sched/fair: fix CPU selection for non latency sensitive tasks
  ANDROID: sched/fair: Also do misfit in overloaded groups
  ANDROID: sched/fair: Don't balance misfits if it would overload local group
  ANDROID: sched/fair: Attempt to improve throughput for asym cap systems
  FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary
  FROMLIST: sched/core: Disable SD_PREFER_SIBLING on asymmetric cpu capacity domains
  FROMLIST: sched/core: Disable SD_ASYM_CPUCAPACITY for root_domains without asymmetry
  FROMLIST: sched/fair: Set rq->rd->overload when misfit
  FROMLIST: sched: Wrap rq->rd->overload accesses with READ/WRITE_ONCE
  FROMLIST: sched: Change root_domain->overload type to int
  FROMLIST: sched/fair: Change prefer_sibling type to bool
  FROMLIST: sched/fair: Consider misfit tasks when load-balancing
  FROMLIST: sched: Add sched_group per-cpu max capacity
  FROMLIST: sched/fair: Add group_misfit_task load-balance type
  FROMLIST: sched: Add static_key for asymmetric cpu capacity optimizations
  UPSTREAM: ANDROID: binder: change down_write to down_read
  UPSTREAM: ANDROID: binder: correct the cmd print for BINDER_WORK_RETURN_ERROR
  UPSTREAM: ANDROID: binder: remove 32-bit binder interface.
  UPSTREAM: android: binder: Use true and false for boolean values
  UPSTREAM: android: binder: Use octal permissions
  UPSTREAM: android: binder: Prefer __func__ to using hardcoded function name
  UPSTREAM: ANDROID: binder: make binder_alloc_new_buf_locked static and indent its arguments
  UPSTREAM: android: binder: Check for errors in binder_alloc_shrinker_init().

Conflicts:
	arch/arm64/Kconfig
	arch/arm64/include/asm/cpucaps.h
	arch/arm64/include/asm/cpufeature.h
	arch/arm64/include/asm/thread_info.h
	arch/arm64/kernel/cpu_errata.c
	arch/arm64/kernel/cpufeature.c
	arch/arm64/kernel/entry.S
	arch/arm64/kernel/ssbd.c
	drivers/base/arch_topology.c
	drivers/md/Kconfig
	drivers/scsi/ufs/ufshcd.c
	drivers/usb/gadget/function/f_fs.c
	include/trace/events/sched.h
	kernel/sched/cpufreq_schedutil.c
	kernel/sched/energy.c
	kernel/sched/fair.c
	kernel/sched/features.h
	kernel/sched/sched.h
	kernel/sched/topology.c
	kernel/sched/tune.c
	kernel/sched/walt.c
	kernel/sched/walt.h
	kernel/stop_machine.c
	kernel/time/tick-sched.c
	net/socket.c
	sound/core/rawmidi.c

Change-Id: Ia246711317930ecd55bb42565a04e6b4fdfc26d2
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2018-08-09 11:57:44 -07:00
Rafael J. Wysocki
30693a4f09 UPSTREAM: cpuidle: menu: Refine idle state selection for running tick
If the tick isn't stopped, the target residency of the state selected
by the menu governor may be greater than the actual time to the next
tick and that means lost energy.

To avoid that, make tick_nohz_get_sleep_length() return the current
time to the next event (before stopping the tick) in addition to the
estimated one via an extra pointer argument and make menu_select()
use that value to refine the state selection when necessary.

Cherry-picked from 296bb1e51a4838a6488ec5ce676607093482ecbc

Change-Id: I6c3b4227817231f437da0879d1ca401522ef3a70
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:09:31 +00:00
Rafael J. Wysocki
8c71f69fb4 UPSTREAM: sched: idle: Select idle state before stopping the tick
In order to address the issue with short idle duration predictions
by the idle governor after the scheduler tick has been stopped,
reorder the code in cpuidle_idle_call() so that the governor idle
state selection runs before tick_nohz_idle_go_idle() and use the
"nohz" hint returned by cpuidle_select() to decide whether or not
to stop the tick.

This isn't straightforward, because menu_select() invokes
tick_nohz_get_sleep_length() to get the time to the next timer
event and the number returned by the latter comes from
__tick_nohz_idle_stop_tick().  Fortunately, however, it is possible
to compute that number without actually stopping the tick and with
the help of the existing code.

Namely, tick_nohz_get_sleep_length() can be made call
tick_nohz_next_event(), introduced earlier, to get the time to the
next non-highres timer event.  If that happens, tick_nohz_next_event()
need not be called by __tick_nohz_idle_stop_tick() again.

If it turns out that the scheduler tick cannot be stopped going
forward or the next timer event is too close for the tick to be
stopped, tick_nohz_get_sleep_length() can simply return the time to
the next event currently programmed into the corresponding clock
event device.

In addition to knowing the return value of tick_nohz_next_event(),
however, tick_nohz_get_sleep_length() needs to know the time to the
next highres timer event, but with the scheduler tick timer excluded,
which can be computed with the help of hrtimer_get_next_event().

That minimum of that number and the tick_nohz_next_event() return
value is the total time to the next timer event with the assumption
that the tick will be stopped.  It can be returned to the idle
governor which can use it for predicting idle duration (under the
assumption that the tick will be stopped) and deciding whether or
not it makes sense to stop the tick before putting the CPU into the
selected idle state.

With the above, the sleep_length field in struct tick_sched is not
necessary any more, so drop it.

Cherry-picked from 554c8aa8ecade210d58a252173bb8f2106552a44

Change-Id: I11bbd473ccb00cf89105f97a3861d56459cc8377
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227
Reported-by: Doug Smythies <dsmythies@telus.net>
Reported-by: Thomas Ilsche <thomas.ilsche@tu-dresden.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:09:22 +00:00
Rafael J. Wysocki
3a25735bd7 UPSTREAM: cpuidle: Return nohz hint from cpuidle_select()
Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.

Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
 (1) the idle exit latency is constrained at 0, or
 (2) the selected state is a polling one, or
 (3) the expected idle period duration is within the tick period
     range.

In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values.  To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.

Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.

Cherry-picked from 45f1ff59e27ca59d33cc1a317e669d90022ccf7d

Change-Id: I3737f2fd00e308d0becba33cc3b1db727241e2a4
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:08:58 +00:00
Rafael J. Wysocki
f6d3093dfc BACKPORT: sched: idle: Do not stop the tick upfront in the idle loop
Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.

Stopping the tick upfront leads to unpleasant outcomes in case the
idle governor doesn't agree with the nohz code on the duration of the
upcoming idle period.  Specifically, if the tick has been stopped and
the idle governor predicts short idle, the situation is bad regardless
of whether or not the prediction is accurate.  If it is accurate, the
tick has been stopped unnecessarily which means excessive overhead.
If it is not accurate, the CPU is likely to spend too much time in
the (shallow, because short idle has been predicted) idle state
selected by the governor [1].

As the first step towards addressing this problem, change the code
to make the tick stopping decision inside of the loop in do_idle().
In particular, do not stop the tick in the cpu_idle_poll() code path.
Also don't do that in tick_nohz_irq_exit() which doesn't really have
enough information on whether or not to stop the tick.

Cherry-picked from 2aaf709a518d26563b80fd7a42379d7aa7ffed4a

 - Fixed conflict with tick_nohz_stopped_cpu not appearing in the
   chunk context. Trivial fix.

Change-Id: Ie316e210971f28aae80dd0f35c5564cb17a81901
Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:08:25 +00:00
Rafael J. Wysocki
55e591cc18 BACKPORT: time: tick-sched: Reorganize idle tick management code
Prepare the scheduler tick code for reworking the idle loop to
avoid stopping the tick in some cases.

The idea is to split the nohz idle entry call to decouple the idle
time stats accounting and preparatory work from the actual tick stop
code, in order to later be able to delay the tick stop once we reach
more power-knowledgeable callers.

Move away the tick_nohz_start_idle() invocation from
__tick_nohz_idle_enter(), rename the latter to
__tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
as a wrapper around it for calling it from the outside.

Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
of calling the entire __tick_nohz_idle_enter(), add another wrapper
disabling and enabling interrupts around tick_nohz_idle_stop_tick()
and make the current callers of tick_nohz_idle_enter() call it too
to retain their current functionality.

Cherry-picked from 0e7767687fdabfc58d5046e7488632bf2ecd4d0c

 - Fixed conflict with hrtimer_next_event_base which does not have
   an extra parameter in the backported code

Change-Id: Iae3b22cdc0db13212af07249b085b4b97abf557e
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Paven Kondati <pkondeti@qti.qualcomm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-07-19 19:08:02 +00:00
Olav Haugan
17e12d32da sched/tick: Ensure timers does not get queued on isolated cpus
Timers should not be queued on isolated cpus. Instead try to find
other cpus to queue timer on.

Change-Id: I7a1a8aade8630226d7349cc37afd1a3abe48fc67
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2018-01-23 13:05:53 -08:00
Runmin Wang
5682ea9f33 Merge remote-tracking branch 'remotes/origin/tmp-9189141' into msm-4.14
* remotes/origin/tmp-9189141:
  Linux 4.14.13
  KVM: s390: prevent buffer overrun on memory hotplug during migration
  KVM: s390: fix cmma migration for multiple memory slots
  mtd: nand: pxa3xx: Fix READOOB implementation
  parisc: qemu idle sleep support
  parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
  apparmor: fix regression in mount mediation when feature set is pinned
  x86/microcode/AMD: Add support for fam17h microcode loading
  Input: elantech - add new icbody type 15
  powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
  ARC: uaccess: dont use "l" gcc inline asm constraint modifier
  iommu/arm-smmu-v3: Cope with duplicated Stream IDs
  iommu/arm-smmu-v3: Don't free page table ops twice
  kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
  kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
  kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
  x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
  fscache: Fix the default for fscache_maybe_release_page()
  sunxi-rsb: Include OF based modalias in device uevent
  drm/i915: Apply Display WA #1183 on skl, kbl, and cfl
  drm/i915: Disable DC states around GMBUS on GLK
  crypto: chelsio - select CRYPTO_GF128MUL
  crypto: pcrypt - fix freeing pcrypt instances
  crypto: chacha20poly1305 - validate the digest size
  crypto: n2 - cure use after free
  efi/capsule-loader: Reinstate virtual capsule mapping
  btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
  userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
  mm/sparse.c: wrong allocation for mem_section
  mm/mprotect: add a cond_resched() inside change_pmd_range()
  kernel/acct.c: fix the acct->needcheck check in check_free_space()
  x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
  x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
  x86/tlb: Drop the _GPL from the cpu_tlbstate export
  x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
  x86/kaslr: Fix the vaddr_end mess
  x86/mm: Map cpu_entry_area at the same place on 4/5 level
  x86/mm: Set MODULES_END to 0xffffffffff000000
  ANDROID: netfilter: xt_qtaguid: Fix 4.14 compilation
  ANDROID: Squashfs: optimize reading uncompressed data
  ANDROID: Squashfs: implement .readpages()
  ANDROID: Squashfs: replace buffer_head with BIO
  ANDROID: Squashfs: refactor page_actor
  ANDROID: usb: f_fs: Prevent gadget unbind if it is already unbound
  Linux 4.14.12
  rtc: m41t80: remove unneeded checks from m41t80_sqw_set_rate
  rtc: m41t80: avoid i2c read in m41t80_sqw_is_prepared
  rtc: m41t80: avoid i2c read in m41t80_sqw_recalc_rate
  rtc: m41t80: fix m41t80_sqw_round_rate return value
  rtc: m41t80: m41t80_sqw_set_rate should return 0 on success
  Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find."
  x86/process: Define cpu_tss_rw in same section as declaration
  x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat()
  x86/dumpstack: Print registers for first stack frame
  x86/dumpstack: Fix partial register dumps
  x86/pti: Make sure the user/kernel PTEs match
  x86/cpu, x86/pti: Do not enable PTI on AMD processors
  capabilities: fix buffer overread on very short xattr
  exec: Weaken dumpability for secureexec
  Linux 4.14.11
  tty: fix tty_ldisc_receive_buf() documentation
  n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
  x86/ldt: Make LDT pgtable free conditional
  x86/ldt: Plug memory leak in error path
  x86/espfix/64: Fix espfix double-fault handling on 5-level systems
  x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR)
  x86/mm: Remove preempt_disable/enable() from __native_flush_tlb()
  x86/smpboot: Remove stale TLB flush invocations
  nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
  staging: android: ion: Fix dma direction for dma_sync_sg_for_cpu/device
  drivers: base: cacheinfo: fix cache type for non-architected system cache
  phy: tegra: fix device-tree node lookups
  binder: fix proc->files use-after-free
  timers: Reinitialize per cpu bases on hotplug
  timers: Invoke timer_start_debug() where it makes sense
  timers: Use deferrable base independent of base::nohz_active
  usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
  USB: Fix off by one in type-specific length check of BOS SSP capability
  usb: add RESET_RESUME for ELSA MicroLink 56K
  usb: Add device quirk for Logitech HD Pro Webcam C925e
  USB: serial: option: adding support for YUGA CLM920-NC5
  USB: serial: option: add support for Telit ME910 PID 0x1101
  USB: serial: qcserial: add Sierra Wireless EM7565
  USB: serial: ftdi_sio: add id for Airbus DS P8GR
  USB: chipidea: msm: fix ulpi-node lookup
  usbip: vhci: stop printing kernel pointer addresses in messages
  usbip: stub: stop printing kernel pointer addresses in messages
  usbip: prevent leaking socket pointer address in messages
  usbip: fix usbip bind writing random string after command in match_busid
  sparc64: repair calling incorrect hweight function from stubs
  skbuff: in skb_copy_ubufs unclone before releasing zerocopy
  skbuff: skb_copy_ubufs must release uarg even without user frags
  skbuff: orphan frags before zerocopy clone
  Revert "mlx5: move affinity hints assignments to generic code"
  ipv6: set all.accept_dad to 0 by default
  ipv4: fib: Fix metrics match when deleting a route
  phylink: ensure AN is enabled
  phylink: ensure the PHY interface mode is appropriately set
  bnxt_en: Fix sources of spurious netpoll warnings
  net: sched: fix static key imbalance in case of ingress/clsact_init error
  vxlan: restore dev->mtu setting based on lower device
  net/mlx5: FPGA, return -EINVAL if size is zero
  tcp: refresh tcp_mstamp from timers callbacks
  ipv6: Honor specified parameters in fibmatch lookup
  net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well.
  tcp: fix potential underestimation on rcv_rtt
  mlxsw: spectrum: Disable MAC learning for ovs port
  tipc: fix hanging poll() for stream sockets
  sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams
  s390/qeth: fix error handling in checksum cmd callback
  net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
  sfc: pass valid pointers from efx_enqueue_unwind
  openvswitch: Fix pop_vlan action for double tagged frames
  net/mlx5: Fix error flow in CREATE_QP command
  net/mlx5e: Prevent possible races in VXLAN control flow
  net/mlx5e: Add refcount to VXLAN structure
  net/mlx5e: Fix features check of IPv6 traffic
  net/mlx5e: Fix possible deadlock of VXLAN lock
  net/mlx5: Fix rate limit packet pacing naming and struct
  tcp: invalidate rate samples during SACK reneging
  sock: free skb in skb_complete_tx_timestamp on error
  net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
  net: Fix double free and memory corruption in get_net_ns_by_id()
  net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
  ipv4: Fix use-after-free when flushing FIB tables
  ip6_gre: fix device features for ioctl setup
  adding missing rcu_read_unlock in ipxip6_rcv
  sctp: Replace use of sockets_allocated with specified macro.
  net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
  net: ipv4: fix for a race condition in raw_sendmsg
  s390/qeth: update takeover IPs after configuration change
  s390/qeth: lock IP table while applying takeover changes
  s390/qeth: don't apply takeover changes to RXIP
  s390/qeth: apply takeover changes when mode is toggled
  tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
  tcp_bbr: reset full pipe detection on loss recovery undo
  tg3: Fix rx hang on MTU change with 5717/5719
  tcp md5sig: Use skb's saddr when replying to an incoming segment
  tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
  RDS: Check cmsg_len before dereferencing CMSG_DATA
  ptr_ring: add barriers
  net: reevalulate autoflowlabel setting after sysctl setting
  net: qmi_wwan: add Sierra EM7565 1199:9091
  netlink: Add netns check on taps
  net: igmp: Use correct source address on IGMPv3 reports
  net: fec: unmap the xmit buffer that are not transferred by DMA
  ipv6: mcast: better catch silly mtu values
  ipv4: igmp: guard against silly MTU values
  kbuild: add '-fno-stack-check' to kernel build options
  block: don't let passthrough IO go into .make_request_fn()
  block: fix blk_rq_append_bio
  cpufreq: schedutil: Use idle_calls counter of the remote CPU
  ALSA: hda - Fix missing COEF init for ALC225/295/299
  ALSA: hda - fix headset mic detection issue on a Dell machine
  ALSA: hda - change the location for one mic on a Lenovo machine
  ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines
  ALSA: hda: Drop useless WARN_ON()
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  gpio: fix "gpio-line-names" property retrieval
  ASoC: tlv320aic31xx: Fix GPIO1 register definition
  ASoC: twl4030: fix child-node lookup
  ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
  ASoC: da7218: fix fix child-node lookup
  ASoC: wm_adsp: Fix validation of firmware and coeff lengths
  ASoC: codecs: msm8916-wcd: Fix supported formats
  iw_cxgb4: Only validate the MSN for successful completions
  ring-buffer: Do no reuse reader page if still in use
  ring-buffer: Mask out the info bits when returning buffer page length
  x86/ldt: Make the LDT mapping RO
  x86/mm/dump_pagetables: Allow dumping current pagetables
  x86/mm/dump_pagetables: Check user space page table for WX pages
  x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy
  x86/mm/pti: Add Kconfig
  x86/dumpstack: Indicate in Oops whether PTI is configured and enabled
  x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming
  x86/mm: Use INVPCID for __native_flush_tlb_single()
  x86/mm: Optimize RESTORE_CR3
  x86/mm: Use/Fix PCID to optimize user/kernel switches
  x86/mm: Abstract switching CR3
  x86/mm: Allow flushing for future ASID switches
  x86/pti: Map the vsyscall page if needed
  x86/pti: Put the LDT in its own PGD if PTI is on
  x86/mm/64: Make a full PGD-entry size hole in the memory map
  x86/events/intel/ds: Map debug buffers in cpu_entry_area
  x86/cpu_entry_area: Add debugstore entries to cpu_entry_area
  x86/mm/pti: Map ESPFIX into user space
  x86/mm/pti: Share entry text PMD
  x86/entry: Align entry text section to PMD boundary
  x86/mm/pti: Share cpu_entry_area with user space page tables
  x86/mm/pti: Force entry through trampoline when PTI active
  x86/mm/pti: Add functions to clone kernel PMDs
  x86/mm/pti: Populate user PGD
  x86/mm/pti: Allocate a separate user PGD
  x86/mm/pti: Allow NX poison to be set in p4d/pgd
  x86/mm/pti: Add mapping helper functions
  x86/pti: Add the pti= cmdline option and documentation
  x86/mm/pti: Add infrastructure for page table isolation
  x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching
  x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y
  x86/cpufeatures: Add X86_BUG_CPU_INSECURE
  tracing: Fix crash when it fails to alloc ring buffer
  tracing: Fix possible double free on failure of allocating trace buffer
  tracing: Remove extra zeroing out of the ring buffer page

  Conflicts:
	drivers/staging/android/ion/ion.c
	kernel/time/timer.c

Change-Id: Ia5b16c96ab44e640e2f10ab535c4c672b670cbdc
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2018-01-11 17:52:14 -08:00
Joel Fernandes
0c688c288f cpufreq: schedutil: Use idle_calls counter of the remote CPU
commit 466a2b42d67644447a1765276259a3ea5531ddff upstream.

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Fixes: 674e75411f (sched: cpufreq: Allow remote cpufreq callbacks)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:31:05 +01:00
Mahesh Sivasubramanian
f024061b16 kernel: tick-sched: Add API to get the next wakeup for a CPU
Add get_next_event_cpu to get the next wakeup time for the CPU. This
is used by the sleep driver if it has to query the next wakeup for a
CPU other than the thread that its running on.

Change-Id: I889de90928b9b1e51a51b2f9205d7865facfcc20
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2017-11-29 15:43:29 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Rafael J. Wysocki
b7eaf1aab9 cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.

That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register.  Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again.  That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.

That has been attributed to CPU utilization metric updates on task
migration that cause the total utilization value for the CPU to be
reduced by the utilization of the migrated task.  If that happens,
the schedutil governor may see a CPU utilization reduction and will
attempt to reduce the CPU frequency accordingly right away.  That
may be premature, though, for example if the system is generally
busy and there are other runnable tasks waiting to be run on that
CPU already.

This is unlikely to be an issue on systems where cpufreq policies are
shared between multiple CPUs, because in those cases the policy
utilization is computed as the maximum of the CPU utilization values
over the whole policy and if that turns out to be low, reducing the
frequency for the policy most likely is a good idea anyway.  On
systems with one CPU per policy, however, it may affect performance
adversely and even lead to increased energy consumption in some cases.

On those systems it may be addressed by taking another utilization
metric into consideration, like whether or not the CPU whose
frequency is about to be reduced has been idle recently, because if
that's not the case, the CPU is likely to be busy in the near future
and its frequency should not be reduced.

To that end, use the counter of idle calls in the timekeeping code.
Namely, make the schedutil governor look at that counter for the
current CPU every time before its frequency is about to be reduced.
If the counter has not changed since the previous iteration of the
governor computations for that CPU, the CPU has been busy for all
that time and its frequency should not be decreased, so if the new
frequency would be lower than the one set previously, the governor
will skip the frequency update.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Joel Fernandes <joelaf@google.com>
2017-03-23 02:12:14 +01:00
Thomas Gleixner
2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Kees Cook
4cc7ecb7f2 param: convert some "on"/"off" users to strtobool
This changes several users of manual "on"/"off" parsing to use
strtobool.

Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Frederic Weisbecker
b78783000d posix-cpu-timers: Migrate to use new tick dependency mask model
Instead of providing asynchronous checks for the nohz subsystem to verify
posix cpu timers tick dependency, migrate the latter to the new mask.

In order to keep track of the running timers and expose the tick
dependency accordingly, we must probe the timers queuing and dequeuing
on threads and process lists.

Unfortunately it implies both task and signal level dependencies. We
should be able to further optimize this and merge all that on the task
level dependency, at the cost of a bit of complexity and may be overhead.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02 16:44:27 +01:00
Frederic Weisbecker
555e0c1ef7 perf: Migrate perf to use new tick dependency mask model
Instead of providing asynchronous checks for the nohz subsystem to verify
perf event tick dependency, migrate perf to the new mask.

Perf needs the tick for two situations:

1) Freq events. We could set the tick dependency when those are
installed on a CPU context. But setting a global dependency on top of
the global freq events accounting is much easier. If people want that
to be optimized, we can still refine that on the per-CPU tick dependency
level. This patch dooesn't change the current behaviour anyway.

2) Throttled events: this is a per-cpu dependency.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02 16:43:00 +01:00
Frederic Weisbecker
e6e6cc22e0 nohz: Use enum code for tick stop failure tracing message
It makes nohz tracing more lightweight, standard and easier to parse.

Examples:

       user_loop-2904  [007] d..1   517.701126: tick_stop: success=1 dependency=NONE
       user_loop-2904  [007] dn.1   518.021181: tick_stop: success=0 dependency=SCHED
    posix_timers-6142  [007] d..1  1739.027400: tick_stop: success=0 dependency=POSIX_TIMER
       user_loop-5463  [007] dN.1  1185.931939: tick_stop: success=0 dependency=PERF_EVENTS

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02 16:42:15 +01:00
Frederic Weisbecker
d027d45d8a nohz: New tick dependency mask
The tick dependency is evaluated on every IRQ and context switch. This
consists is a batch of checks which determine whether it is safe to
stop the tick or not. These checks are often split in many details:
posix cpu timers, scheduler, sched clock, perf events.... each of which
are made of smaller details: posix cpu timer involves checking process
wide timers then thread wide timers. Perf involves checking freq events
then more per cpu details.

Checking these informations asynchronously every time we update the full
dynticks state bring avoidable overhead and a messy layout.

Let's introduce instead tick dependency masks: one for system wide
dependency (unstable sched clock, freq based perf events), one for CPU
wide dependency (sched, throttling perf events), and task/signal level
dependencies (posix cpu timers). The subsystems are responsible
for setting and clearing their dependency through a set of APIs that will
take care of concurrent dependency mask modifications and kick targets
to restart the relevant CPU tick whenever needed.

This new dependency engine stays beside the old one until all subsystems
having a tick dependency are converted to it.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2016-03-02 16:41:39 +01:00
Jean Delvare
46373a15f6 time: nohz: Expose tick_nohz_enabled
The cpuidle subsystem needs it.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-15 22:34:39 +01:00
Vatika Harlalka
9642d18eee nohz: Affine unpinned timers to housekeepers
The problem addressed in this patch is about affining unpinned
timers. Adaptive or Full Dynticks CPUs are currently disturbed
by unnecessary jitter due to firing of such timers on them.

This patch will affine timers to online CPUs which are not full
dynticks in NOHZ_FULL configured systems. It should not
introduce overhead in nohz full off case due to static keys.

Signed-off-by: Vatika Harlalka <vatikaharlalka@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441119060-2230-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-02 10:33:22 +02:00
Frederic Weisbecker
de734f89b6 nohz: Remove useless argument on tick_nohz_task_switch()
Leftover from early code.

Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:45:01 +02:00
Frederic Weisbecker
73738a95d0 nohz: Restart nohz full tick from irq exit
Restart the tick when necessary from the irq exit path. It makes nohz
full more flexible, simplify the related IPIs and doesn't bring
significant overhead on irq exit.

In a longer term view, it will allow us to piggyback the nohz kick
on the scheduler IPI in the future instead of sending a dedicated IPI
that often doubles the scheduler IPI on task wakeup. This will require
more changes though including careful review of resched_curr() callers
to include nohz full needs.

Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:45:00 +02:00
Chris Metcalf
03f6199a35 nohz: Prevent tilegx network driver interrupts
Normally the tilegx networking shim sends irqs to all the cores
to distribute the load of processing incoming-packet interrupts,
so that you can get to multiple Gb's of traffic inbound.

However, in nohz_full mode we don't want to interrupt the
nohz_full cores by default, so we limit the set of cores we use
to only the online housekeeping cores.

To make client code easier to read, we introduce a new nohz_full
accessor, housekeeping_cpumask(), which returns a pointer to the
housekeeping_mask if nohz_full is enabled, and otherwise returns
the cpu_possible_mask.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-29 15:44:59 +02:00
Thomas Gleixner
37b64a4206 tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
Making tick_broadcast_oneshot_control() independent from
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
there.

Provide a proper stub inline.

Fixes: f32dd11705 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-07 21:56:34 +02:00
Thomas Gleixner
f32dd11705 tick/broadcast: Make idle check independent from mode and config
Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

 - If broadcast is disabled in the config, then return busy

 - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
   busy if the broadcast device is hrtimer based.

 - If oneshot mode is enabled in the config call the original
   tick_broadcast_oneshot_control() function. That function needs
   extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Linus Torvalds
43c9fad942 Merge tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
 "The rework of backlight interface selection API from Hans de Goede
  stands out from the number of commits and the number of affected
  places perspective.  The cpufreq core fixes from Viresh Kumar are
  quite significant too as far as the number of commits goes and because
  they should reduce CPU online/offline overhead quite a bit in the
  majority of cases.

  From the new featues point of view, the ACPICA update (to upstream
  revision 20150515) adding support for new ACPI 6 material to ACPICA is
  the one that matters the most as some new significant features will be
  based on it going forward.  Also included is an update of the ACPI
  device power management core to follow ACPI 6 (which in turn reflects
  the Windows' device PM implementation), a PM core extension to support
  wakeup interrupts in a more generic way and support for the ACPI _CCA
  device configuration object.

  The rest is mostly fixes and cleanups all over and some documentation
  updates, including new DT bindings for Operating Performance Points.

  There is one fix for a regression introduced in the 4.1 cycle, but it
  adds quite a number of lines of code, it wasn't really ready before
  Thursday and you were on vacation, so I refrained from pushing it on
  the last minute for 4.1.

  Specifics:

   - ACPICA update to upstream revision 20150515 including basic support
     for ACPI 6 features: new ACPI tables introduced by ACPI 6 (STAO,
     XENV, WPBT, NFIT, IORT), changes related to the other tables (DTRM,
     FADT, LPIT, MADT), new predefined names (_BTH, _CR3, _DSD, _LPI,
     _MTL, _PRR, _RDI, _RST, _TFP, _TSN), fixes and cleanups (Bob Moore,
     Lv Zheng).

   - ACPI device power management core code update to follow ACPI 6
     which reflects the ACPI device power management implementation in
     Windows (Rafael J Wysocki).

   - rework of the backlight interface selection logic to reduce the
     number of kernel command line options and improve the handling of
     DMI quirks that may be involved in that and to make the code
     generally more straightforward (Hans de Goede).

   - fixes for the ACPI Embedded Controller (EC) driver related to the
     handling of EC transactions (Lv Zheng).

   - fix for a regression related to the ACPI resources management and
     resulting from a recent change of ACPI initialization code ordering
     (Rafael J Wysocki).

   - fix for a system initialization regression related to ACPI
     introduced during the 3.14 cycle and caused by running the code
     that switches the platform over to the ACPI mode too early in the
     initialization sequence (Rafael J Wysocki).

   - support for the ACPI _CCA device configuration object related to
     DMA cache coherence (Suravee Suthikulpanit).

   - ACPI/APEI fixes and cleanups (Jiri Kosina, Borislav Petkov).

   - ACPI battery driver cleanups (Luis Henriques, Mathias Krause).

   - ACPI processor driver cleanups (Hanjun Guo).

   - cleanups and documentation update related to the ACPI device
     properties interface based on _DSD (Rafael J Wysocki).

   - ACPI device power management fixes (Rafael J Wysocki).

   - assorted cleanups related to ACPI (Dominik Brodowski, Fabian
     Frederick, Lorenzo Pieralisi, Mathias Krause, Rafael J Wysocki).

   - fix for a long-standing issue causing General Protection Faults to
     be generated occasionally on return to user space after resume from
     ACPI-based suspend-to-RAM on 32-bit x86 (Ingo Molnar).

   - fix to make the suspend core code return -EBUSY consistently in all
     cases when system suspend is aborted due to wakeup detection (Ruchi
     Kandoi).

   - support for automated device wakeup IRQ handling allowing drivers
     to make their PM support more starightforward (Tony Lindgren).

   - new tracepoints for suspend-to-idle tracing and rework of the
     prepare/complete callbacks tracing in the PM core (Todd E Brandt,
     Rafael J Wysocki).

   - wakeup sources framework enhancements (Jin Qian).

   - new macro for noirq system PM callbacks (Grygorii Strashko).

   - assorted cleanups related to system suspend (Rafael J Wysocki).

   - cpuidle core cleanups to make the code more efficient (Rafael J
     Wysocki).

   - powernv/pseries cpuidle driver update (Shilpasri G Bhat).

   - cpufreq core fixes related to CPU online/offline that should reduce
     the overhead of these operations quite a bit, unless the CPU in
     question is physically going away (Viresh Kumar, Saravana Kannan).

   - serialization of cpufreq governor callbacks to avoid race
     conditions in some cases (Viresh Kumar).

   - intel_pstate driver fixes and cleanups (Doug Smythies, Prarit
     Bhargava, Joe Konno).

   - cpufreq driver (arm_big_little, cpufreq-dt, qoriq) updates (Sudeep
     Holla, Felipe Balbi, Tang Yuantian).

   - assorted cleanups in cpufreq drivers and core (Shailendra Verma,
     Fabian Frederick, Wang Long).

   - new Device Tree bindings for representing Operating Performance
     Points (Viresh Kumar).

   - updates for the common clock operations support code in the PM core
     (Rajendra Nayak, Geert Uytterhoeven).

   - PM domains core code update (Geert Uytterhoeven).

   - Intel Knights Landing support for the RAPL (Running Average Power
     Limit) power capping driver (Dasaratharaman Chandramouli).

   - fixes related to the floor frequency setting on Atom SoCs in the
     RAPL power capping driver (Ajay Thomas).

   - runtime PM framework documentation update (Ben Dooks).

   - cpupower tool fix (Herton R Krzesinski)"

* tag 'pm+acpi-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (194 commits)
  cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state
  x86: Load __USER_DS into DS/ES after resume
  PM / OPP: Add binding for 'opp-suspend'
  PM / OPP: Allow multiple OPP tables to be passed via DT
  PM / OPP: Add new bindings to address shortcomings of existing bindings
  ACPI: Constify ACPI device IDs in documentation
  ACPI / enumeration: Document the rules regarding the PRP0001 device ID
  ACPI / video: Make acpi_video_unregister_backlight() private
  acpi-video-detect: Remove old API
  toshiba-acpi: Port to new backlight interface selection API
  thinkpad-acpi: Port to new backlight interface selection API
  sony-laptop: Port to new backlight interface selection API
  samsung-laptop: Port to new backlight interface selection API
  msi-wmi: Port to new backlight interface selection API
  msi-laptop: Port to new backlight interface selection API
  intel-oaktrail: Port to new backlight interface selection API
  ideapad-laptop: Port to new backlight interface selection API
  fujitsu-laptop: Port to new backlight interface selection API
  eeepc-laptop: Port to new backlight interface selection API
  dell-wmi: Port to new backlight interface selection API
  ...
2015-06-23 14:18:07 -07:00
Rafael J. Wysocki
87e9b9f1d8 PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND
Since idle_should_freeze() is defined to always return 'false'
for CONFIG_SUSPEND unset, all of the code depending on it in
cpuidle_idle_call() is not necessary in that case.

Make that code depend on CONFIG_SUSPEND too to avoid building it
when it is not going to be used.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2015-05-19 02:44:24 +02:00
Chris Metcalf
83dedea8a0 nohz: Add tick_nohz_full_add_cpus_to() API
This API is useful to modify a cpumask indicating some special
nohz-type functionality so that the nohz cores are automatically
added to that set.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Jones <davej@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1429024675-18938-1-git-send-email-cmetcalf@ezchip.com
Link: http://lkml.kernel.org/r/1430928266-24888-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-07 12:02:51 +02:00
Thomas Gleixner
a49b116dcb clockevents: Cleanup dead cpu explicitely
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:37 +02:00
Thomas Gleixner
52c063d1ad clockevents: Make tick handover explicit
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:36 +02:00
Thomas Gleixner
1fe5d5c3c9 clockevents: Provide explicit broadcast oneshot control functions
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:33 +02:00
Thomas Gleixner
592a438ff3 clockevents: Provide explicit broadcast control functions
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-03 08:44:31 +02:00
Preeti U Murthy
345527b1ed clockevents: Fix cpu_down() race for hrtimer based broadcasting
It was found when doing a hotplug stress test on POWER, that the
machine either hit softlockups or rcu_sched stall warnings.  The
issue was traced to commit:

  7cba160ad7 ("powernv/cpuidle: Redesign idle states management")

which exposed the cpu_down() race with hrtimer based broadcast mode:

  5d1638acb9 ("tick: Introduce hrtimer based broadcast")

The race is the following:

Assume CPU1 is the CPU which holds the hrtimer broadcasting duty
before it is taken down.

	CPU0					CPU1

	cpu_down()				take_cpu_down()
						disable_interrupts()

	cpu_die()

	while (CPU1 != CPU_DEAD) {
		msleep(100);
		switch_to_idle();
		stop_cpu_timer();
		schedule_broadcast();
	}

	tick_cleanup_cpu_dead()
		take_over_broadcast()

So after CPU1 disabled interrupts it cannot handle the broadcast
hrtimer anymore, so CPU0 will be stuck forever.

Fix this by explicitly taking over broadcast duty before cpu_die().

This is a temporary workaround. What we really want is a callback
in the clockevent device which allows us to do that from the dying
CPU by pushing the hrtimer onto a different cpu. That might involve
an IPI and is definitely more complex than this immediate fix.

Changelog was picked up from:

    https://lkml.org/lkml/2015/2/16/213

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: mpe@ellerman.id.au
Cc: nicolas.pitre@linaro.org
Cc: peterz@infradead.org
Cc: rjw@rjwysocki.net
Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com
[ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 14:25:39 +02:00
Thomas Gleixner
7270d11c56 arm/bL_switcher: Kill tick suspend hackery
Use the new tick_suspend/resume_local() and get rid of the
homebrewn implementation of these in the ARM bL switcher.  The
check for the cpumask is completely pointless.  There is no harm
to suspend a per cpu tick device unconditionally.  If that's a
real issue then we fix it proper at the core level and not with
some completely undocumented hacks in some random core code.

Move the tick internals to the core code, now that this nuisance
is gone.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ rjw: Rebase, changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/1655112.Ws17YsMfN7@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 14:23:00 +02:00
Thomas Gleixner
f46481d0a7 tick/xen: Provide and use tick_suspend_local() and tick_resume_local()
Xen calls on every cpu into tick_resume() which is just wrong.
tick_resume() is for the syscore global suspend/resume
invocation. What XEN really wants is a per cpu local resume
function.

Provide a tick_resume_local() function and use it in XEN.

Also provide a complementary tick_suspend_local() and modify
tick_unfreeze() and tick_freeze(), respectively, to use the
new local tick resume/suspend functions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Combined two patches, rebased, modified subject/changelog. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
[ Merged to latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 14:23:00 +02:00
Thomas Gleixner
4ffee521f3 clockevents: Make suspend/resume calls explicit
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.

We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.

No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 14:22:59 +02:00
Thomas Gleixner
c1797baf68 tick: Move core only declarations and functions to core
No point to expose everything to the world. People just believe
such functions can be abused for whatever purposes. Sigh.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5 ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/28017337.VbCUc39Gme@vostro.rjw.lan
[ Merged to latest timers/core ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-01 14:22:58 +02:00
Rafael J. Wysocki
124cf9117c PM / sleep: Make it possible to quiesce timers during suspend-to-idle
The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.

However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion.  A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.

The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping.  That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs.  It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.

Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit.  Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.

To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-02-15 19:40:09 +01:00
Linus Torvalds
1ee07ef6b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "This patch set contains the main portion of the changes for 3.18 in
  regard to the s390 architecture.  It is a bit bigger than usual,
  mainly because of a new driver and the vector extension patches.

  The interesting bits are:
   - Quite a bit of work on the tracing front.  Uprobes is enabled and
     the ftrace code is reworked to get some of the lost performance
     back if CONFIG_FTRACE is enabled.
   - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
     IPTE range facility is added.
   - The rwlock code is re-factored to improve writer fairness and to be
     able to use the interlocked-access instructions.
   - The kernel part for the support of the vector extension is added.
   - The device driver to access the CD/DVD on the HMC is added, this
     will hopefully come in handy to improve the installation process.
   - Add support for control-unit initiated reconfiguration.
   - The crypto device driver is enhanced to enable the additional AP
     domains and to allow the new crypto hardware to be used.
   - Bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
  s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
  s390/ftrace: remove 31 bit ftrace support
  s390/kdump: add support for vector extension
  s390/disassembler: add vector instructions
  s390: add support for vector extension
  s390/zcrypt: Toleration of new crypto hardware
  s390/idle: consolidate idle functions and definitions
  s390/nohz: use a per-cpu flag for arch_needs_cpu
  s390/vtime: do not reset idle data on CPU hotplug
  s390/dasd: add support for control unit initiated reconfiguration
  s390/dasd: fix infinite loop during format
  s390/mm: make use of ipte range facility
  s390/setup: correct 4-level kernel page table detection
  s390/topology: call set_sched_topology early
  s390/uprobes: architecture backend for uprobes
  s390/uprobes: common library for kprobes and uprobes
  s390/rwlock: use the interlocked-access facility 1 instructions
  s390/rwlock: improve writer fairness
  s390/rwlock: remove interrupt-enabling rwlock variant.
  s390/mm: remove change bit override support
  ...
2014-10-14 03:47:00 +02:00
Martin Schwidefsky
fe0f49768d s390/nohz: use a per-cpu flag for arch_needs_cpu
Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-09 09:14:02 +02:00
Frederic Weisbecker
a80e49e2cc nohz: Move nohz full init call to tick init
This way we unbloat a bit main.c and more importantly we initialize
nohz full after init_IRQ(). This dependency will be needed in further
patches because nohz full needs irq work to raise its own IRQ.
Information about the support for this ability on ARM64 is obtained on
init_IRQ() which initialize the pointer to __smp_call_function.

Since tick_init() is called right after init_IRQ(), this is a good place
to call tick_nohz_init() and prepare for that dependency.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:34:44 +02:00
Frederic Weisbecker
40bea03959 nohz: Restore NMI safe local irq work for local nohz kick
The local nohz kick is currently used by perf which needs it to be
NMI-safe. Recent commit though (7d1311b93e)
changed its implementation to fire the local kick using the remote kick
API. It was convenient to make the code more generic but the remote kick
isn't NMI-safe.

As a result:

	WARNING: CPU: 3 PID: 18062 at kernel/irq_work.c:72 irq_work_queue_on+0x11e/0x140()
	CPU: 3 PID: 18062 Comm: trinity-subchil Not tainted 3.16.0+ #34
	0000000000000009 00000000903774d1 ffff880244e06c00 ffffffff9a7f1e37
	0000000000000000 ffff880244e06c38 ffffffff9a0791dd ffff880244fce180
	0000000000000003 ffff880244e06d58 ffff880244e06ef8 0000000000000000
	Call Trace:
	<NMI>  [<ffffffff9a7f1e37>] dump_stack+0x4e/0x7a
	[<ffffffff9a0791dd>] warn_slowpath_common+0x7d/0xa0
	[<ffffffff9a07930a>] warn_slowpath_null+0x1a/0x20
	[<ffffffff9a17ca1e>] irq_work_queue_on+0x11e/0x140
	[<ffffffff9a10a2c7>] tick_nohz_full_kick_cpu+0x57/0x90
	[<ffffffff9a186cd5>] __perf_event_overflow+0x275/0x350
	[<ffffffff9a184f80>] ? perf_event_task_disable+0xa0/0xa0
	[<ffffffff9a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
	[<ffffffff9a187934>] perf_event_overflow+0x14/0x20
	[<ffffffff9a020386>] intel_pmu_handle_irq+0x206/0x410
	[<ffffffff9a0b54d3>] ? arch_vtime_task_switch+0x63/0x130
	[<ffffffff9a01937b>] perf_event_nmi_handler+0x2b/0x50
	[<ffffffff9a007b72>] nmi_handle+0xd2/0x390
	[<ffffffff9a007aa5>] ? nmi_handle+0x5/0x390
	[<ffffffff9a0d131b>] ? lock_release+0xab/0x330
	[<ffffffff9a008062>] default_do_nmi+0x72/0x1c0
	[<ffffffff9a0c925f>] ? cpuacct_account_field+0xcf/0x200
	[<ffffffff9a008268>] do_nmi+0xb8/0x100

Lets fix this by restoring the use of local irq work for the nohz local
kick.

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-04 22:35:59 +02:00
Linus Torvalds
98959948a7 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - Move the nohz kick code out of the scheduler tick to a dedicated IPI,
   from Frederic Weisbecker.

  This necessiated quite some background infrastructure rework,
  including:

   * Clean up some irq-work internals
   * Implement remote irq-work
   * Implement nohz kick on top of remote irq-work
   * Move full dynticks timer enqueue notification to new kick
   * Move multi-task notification to new kick
   * Remove unecessary barriers on multi-task notification

 - Remove proliferation of wait_on_bit() action functions and allow
   wait_on_bit_action() functions to support a timeout.  (Neil Brown)

 - Another round of sched/numa improvements, cleanups and fixes.  (Rik
   van Riel)

 - Implement fast idling of CPUs when the system is partially loaded,
   for better scalability.  (Tim Chen)

 - Restructure and fix the CPU hotplug handling code that may leave
   cfs_rq and rt_rq's throttled when tasks are migrated away from a dead
   cpu.  (Kirill Tkhai)

 - Robustify the sched topology setup code.  (Peterz Zijlstra)

 - Improve sched_feat() handling wrt.  static_keys (Jason Baron)

 - Misc fixes.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  sched/fair: Fix 'make xmldocs' warning caused by missing description
  sched: Use macro for magic number of -1 for setparam
  sched: Robustify topology setup
  sched: Fix sched_setparam() policy == -1 logic
  sched: Allow wait_on_bit_action() functions to support a timeout
  sched: Remove proliferation of wait_on_bit() action functions
  sched/numa: Revert "Use effective_load() to balance NUMA loads"
  sched: Fix static_key race with sched_feat()
  sched: Remove extra static_key*() function indirection
  sched/rt: Fix replenish_dl_entity() comments to match the current upstream code
  sched: Transform resched_task() into resched_curr()
  sched/deadline: Kill task_struct->pi_top_task
  sched: Rework check_for_tasks()
  sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime()
  sched/fair: Disable runtime_enabled on dying rq
  sched/numa: Change scan period code to match intent
  sched/numa: Rework best node setting in task_numa_migrate()
  sched/numa: Examine a task move when examining a task swap
  sched/numa: Simplify task_numa_compare()
  sched/numa: Use effective_load() to balance NUMA loads
  ...
2014-08-04 16:23:30 -07:00
Paul E. McKenney
c0f489d2c6 rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs
Binding the grace-period kthreads to the timekeeping CPU resulted in
significant performance decreases for some workloads.  For more detail,
see:

https://lkml.org/lkml/2014/6/3/395 for benchmark numbers

https://lkml.org/lkml/2014/6/4/218 for CPU statistics

It turns out that it is necessary to bind the grace-period kthreads
to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU
on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other.
In other cases, it suffices to bind the grace-period kthreads to the
set of non-nohz_full CPUs.

This commit therefore creates a tick_nohz_not_full_mask that is the
complement of tick_nohz_full_mask, and then binds the grace-period
kthread to the set of CPUs indicated by this new mask, which covers
the CONFIG_NO_HZ_FULL_SYSIDLE=n case.  The CONFIG_NO_HZ_FULL_SYSIDLE=y
case still binds the grace-period kthreads to the timekeeping CPU.
This commit also includes the tick_nohz_full_enabled() check suggested
by Frederic Weisbecker.

Reported-by: Jet Chen <jet.chen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Created housekeeping_affine() and housekeeping_mask per
  fweisbec feedback. ]
2014-07-09 09:15:02 -07:00
Frederic Weisbecker
3d36aebc2e nohz: Support nohz full remote kick
Remotely kicking a full nohz CPU in order to make it re-evaluate its
next tick is currently implemented using the scheduler IPI.

However this bloats a scheduler fast path with an off-topic feature.
The scheduler tick was abused here for its cool "callable
anywhere/anytime" properties.

But now that the irq work subsystem can queue remote callbacks, it's
a perfect fit to safely queue IPIs when interrupts are disabled
without worrying about concurrent callers.

So lets implement remote kick on top of irq work. This is going to
be used when a new event requires the next tick to be recalculated:
more than 1 task competing on the CPU, timer armed, ...

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-06-16 16:26:54 +02:00
Frederic Weisbecker
5acac1be49 tick: Rename tick_check_idle() to tick_irq_enter()
This makes the code more symetric against the existing tick functions
called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().

These function are also symetric as they mirror each other's action:
we start to account idle time on irq exit and we stop this accounting
on irq entry. Also the tick is stopped on irq exit and timekeeping
catches up with the tickless time elapsed until we reach irq entry.

This rename was suggested by Peter Zijlstra a long while ago but it
got forgotten in the mass.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-15 23:05:31 +01:00
Frederic Weisbecker
58135f574f context_tracking: Wrap static key check into more intuitive function name
Use a function with a meaningful name to check the global context
tracking state. static_key_false() is a bit confusing for reviewers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02 20:43:14 +01:00
Frederic Weisbecker
e8fcaa5c54 nohz: Convert a few places to use local per cpu accesses
A few functions use remote per CPU access APIs when they
deal with local values.

Just do the right conversion to improve performance, code
readability and debug checks.

While at it, lets extend some of these function names with *_this_cpu()
suffix in order to display their purpose more clearly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02 20:39:30 +01:00