This change is for general scheduler improvement.
Change-Id: I50d41aa3338803cbd45ff6314b2bb3978c59282b
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
commit 85572c2c4a45a541e880e087b5b17a48198b2416 upstream.
The scheduler code calling cpufreq_update_util() may run during CPU
offline on the target CPU after the IRQ work lists have been flushed
for it, so the target CPU should be prevented from running code that
may queue up an IRQ work item on it at that point.
Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
is set for at least one cpufreq policy in the system, because that
allows the CPU going offline to run the utilization update callback
of the cpufreq governor on behalf of another (online) CPU in some
cases.
If that happens, the cpufreq governor callback may queue up an IRQ
work on the CPU running it, which is going offline, and the IRQ work
may not be flushed after that point. Moreover, that IRQ work cannot
be flushed until the "offlining" CPU goes back online, so if any
other CPU calls irq_work_sync() to wait for the completion of that
IRQ work, it will have to wait until the "offlining" CPU is back
online and that may not happen forever. In particular, a system-wide
deadlock may occur during CPU online as a result of that.
The failing scenario is as follows. CPU0 is the boot CPU, so it
creates a cpufreq policy and becomes the "leader" of it
(policy->cpu). It cannot go offline, because it is the boot CPU.
Next, other CPUs join the cpufreq policy as they go online and they
leave it when they go offline. The last CPU to go offline, say CPU3,
may queue up an IRQ work while running the governor callback on
behalf of CPU0 after leaving the cpufreq policy because of the
dvfs_possible_from_any_cpu effect described above. Then, CPU0 is
the only online CPU in the system and the stale IRQ work is still
queued on CPU3. When, say, CPU1 goes back online, it will run
irq_work_sync() to wait for that IRQ work to complete and so it
will wait for CPU3 to go back online (which may never happen even
in principle), but (worse yet) CPU0 is waiting for CPU1 at that
point too and a system-wide deadlock occurs.
To address this problem notice that CPUs which cannot run cpufreq
utilization update code for themselves (for example, because they
have left the cpufreq policies that they belonged to), should also
be prevented from running that code on behalf of the other CPUs that
belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
in that case the cpufreq_update_util_data pointer of the CPU running
the code must not be NULL as well as for the CPU which is the target
of the cpufreq utilization update in progress.
Accordingly, change cpufreq_this_cpu_can_update() into a regular
function in kernel/sched/cpufreq.c (instead of a static inline in a
header file) and make it check the cpufreq_update_util_data pointer
of the local CPU if dvfs_possible_from_any_cpu is set for the target
cpufreq policy.
Also update the schedutil governor to do the
cpufreq_this_cpu_can_update() check in the non-fast-switch
case too to avoid the stale IRQ work issues.
Change-Id: Idb7f18129f59a82485a5eb93dc26c6f1a463a76a
Fixes: 99d14d0e16 ("cpufreq: Process remote callbacks from any CPU if the platform permits")
Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/
Reported-by: Anson Huang <anson.huang@nxp.com>
Tested-by: Anson Huang <anson.huang@nxp.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Git-commit: 85572c2c4a45a541e880e087b5b17a48198b2416
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Santosh Mardi <gsantosh@codeaurora.org>
[render: Account for function name differences]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
- Used by mi_thermald to switch between various thermal profiles.
- Extracted from MiCode/Xiaomi_Kernel_OpenSource
from branch 'cepheus-q-oss'
- Cleaned up logging
- Fixed coding style
- use the qcom default notifier to remove dependency on xiaomi drm notifier
Change-Id: Ifd3f9b33959e38aa55b96342b34f68691dc7f68a
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
Introduce the devfreq-cpufreq governor, used to scale the
DDR device frequency based on the current cpu frequency.
Change-Id: I0cf0bb2128cd104dccc44e3e3781d1d958d532ef
Signed-off-by: Santosh Mardi <gsantosh@codeaurora.org>
When the policy limits are applied with fast switch enabled, the
policy->cur is not updated. This can result in incorrect calculation
of the average capacity and any subsequent limit updates.
Update cpufreq_policy_apply_limits_fast() API to return a non-zero
value when the frequency is updated. Make use of this return value
and update policy->cur. While at it, print cpu_frequency trace point,
when frequency is changed due to change in limits.
Change-Id: I51732fa061aac11231d1f18ca70f31f252f0a0dd
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
* refs/heads/tmp-8ed9bc6:
Revert "staging: android: ion: fix sys heap pool's gfp_flags"
Linux 4.14.106
perf/x86/intel: Implement support for TSX Force Abort
x86: Add TSX Force Abort CPUID/MSR
perf/x86/intel: Generalize dynamic constraint creation
perf/x86/intel: Make cpuc allocations consistent
driver core: Postpone DMA tear-down until after devres release
ath9k: Avoid OF no-EEPROM quirks without qca,no-eeprom
gfs2: Fix missed wakeups in find_insert_glock
ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
drm: disable uncached DMA optimization for ARM and arm64
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3
arm64: dts: hikey: Give wifi some time after power-on
scsi: aacraid: Fix missing break in switch statement
iscsi_ibft: Fix missing break in switch statement
Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
Input: wacom_serial4 - add support for Wacom ArtPad II tablet
qed: Consider TX tcs while deriving the max num_queues for PF.
qed: Fix EQ full firmware assert.
fs: ratelimit __find_get_block_slow() failure message.
i2c: omap: Use noirq system sleep pm ops to idle device for suspend
MIPS: Remove function size check in get_frame_info()
perf trace: Support multiple "vfs_getname" probes
perf symbols: Filter out hidden symbols from labels
s390/qeth: fix use-after-free in error path
netfilter: nf_nat: skip nat clash resolution for same-origin entries
selftests: netfilter: add simple masq/redirect test cases
selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
dmaengine: dmatest: Abort test in case of mapping error
vsock/virtio: reset connected sockets on device removal
vsock/virtio: fix kernel panic after device hot-unplug
dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
bpf: fix lockdep false positive in percpu_freelist
bpf, selftests: fix handling of sparse CPU allocations
relay: check return of create_buf_file() properly
irqchip/gic-v3-its: Fix ITT_entry_size accessor
net: stmmac: Disable EEE mode earlier in XMIT callback
net: stmmac: Send TSO packets always from Queue 0
net: stmmac: Fallback to Platform Data clock in Watchdog conversion
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
usb: phy: fix link errors
DTS: CI20: Fix bugs in ci20's device tree.
arm64: dts: add msm8996 compatible to gicv3
ARM: pxa: ssp: unneeded to free devm_ allocated data
bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
soc: fsl: qbman: avoid race in clearing QMan interrupt
arm64: dts: renesas: r8a7796: Enable DMA for SCIF2
ARM: dts: omap4-droid4: Fix typo in cpcap IRQ flags
autofs: fix error return in autofs_fill_super()
autofs: drop dentry reference only when it is never used
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
lib/test_kmod.c: potential double free in error handling
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
x86_64: increase stack size for KASAN_EXTRA
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
apparmor: Fix aa_label_build() error handling for failed merges
arm64: kprobe: Always blacklist the KVM world-switch code
x86/microcode/amd: Don't falsely trick the late loading mechanism
cifs: fix computation for MAX_SMB2_HDR_SIZE
platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
scsi: libfc: free skb when receiving invalid flogi resp
qed: Fix stack out of bounds bug
qed: Fix system crash in ll2 xmit
qed: Fix VF probe failure while FLR
qed: Fix LACP pdu drops for VFs
qed: Fix bug in tx promiscuous mode settings
nfs: Fix NULL pointer dereference of dev_name
selftests: timers: use LDLIBS instead of LDFLAGS
gpio: vf610: Mask all GPIO interrupts
netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
net: hns: Restart autoneg need return failed when autoneg off
net: hns: Fix for missing of_node_put() after of_parse_phandle()
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
xtensa: SMP: limit number of possible CPUs by NR_CPUS
xtensa: SMP: mark each possible CPU as present
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: fix secondary CPU initialization
selftests: cpu-hotplug: fix case where CPUs offline > CPUs present
xtensa: SMP: fix ccount_timer_shutdown
iommu/amd: Fix IOMMU page flush when detach device from a domain
ipvs: Fix signed integer overflow when setsockopt timeout
iommu/amd: Unmap all mapped pages in error path of map_sg
iommu/amd: Call free_iova_fast with pfn in map_sg
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
perf tools: Handle TOPOLOGY headers with no CPU
perf core: Fix perf_proc_update_handler() bug
vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
media: uvcvideo: Fix 'type' check leading to overflow
scsi: core: reset host byte in DID_NEXUS_FAILURE case
exec: Fix mem leak in kernel_read_file
Bluetooth: Fix locking in bt_accept_enqueue() for BH context
xtensa: fix get_wchan
hugetlbfs: fix races and page leaks during migration
MIPS: irq: Allocate accurate order pages for irq stack
applicom: Fix potential Spectre v1 vulnerabilities
x86/CPU/AMD: Set the CPB bit unconditionally on F17h
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
net: phy: Micrel KSZ8061: link failure after cable connect
tun: remove unnecessary memory barrier
tun: fix blocking read
mpls: Return error for RTA_GATEWAY attribute
ipv6: Return error for RTA_VIA attribute
ipv4: Return error for RTA_VIA attribute
net: avoid use IPCB in cipso_v4_error
net: Add __icmp_send helper.
xen-netback: fix occasional leak of grant ref mappings under memory pressure
xen-netback: don't populate the hash cache on XenBus disconnect
net: socket: set sock->sk to NULL after calling proto_ops::release()
net: sit: fix memory leak in sit_init_net()
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
net: netem: fix skb length BUG_ON in __skb_to_sgvec
netlabel: fix out-of-bounds memory accesses
net: dsa: mv88e6xxx: Fix u64 statistics
hv_netvsc: Fix IP header checksum for coalesced packets
geneve: correctly handle ipv6.disable module parameter
bnxt_en: Drop oversize TX packets to prevent errors.
tipc: fix RDM/DGRAM connect() regression
team: Free BPF filter when unregistering netdev
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net-sysfs: Fix mem leak in netdev_register_kobject
net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
staging: android: ion: fix sys heap pool's gfp_flags
staging: wilc1000: fix to set correct value for 'vif_num'
staging: comedi: ni_660x: fix missing break in switch statement
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
USB: serial: cp210x: add ID for Ingenico 3070
USB: serial: option: add Telit ME910 ECM composition
cpufreq: Use struct kobj_attribute instead of struct global_attr
ANDROID: cuttlefish: enable CONFIG_INET_UDP_DIAG=y
ANDROID: cuttlefish: enable CONFIG_USB_RTL8152=y
Change-Id: Id5bc9a3c0ca235fcf07904455ea829c7f49618ad
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
When new cpufreq min/max limits are set, the expectation is that the CPU
frequency will be updated appropriately. This happens for targets which do
not utilize the cpufreq fast switch APIs, but the code to do this is not
present by default for the fast switch path. Add the necessary code to
update frequency to reflect the new cpufreq limits for targets with fast
switch enabled.
Change-Id: I211b6117005df9d340dfe0d825032cd7600cbffa
Signed-off-by: Jonathan Avila <avilaj@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I50d41aa3338803cbd45ff6314b2bb3978c59282b
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Changes in 4.14.106
cpufreq: Use struct kobj_attribute instead of struct global_attr
USB: serial: option: add Telit ME910 ECM composition
USB: serial: cp210x: add ID for Ingenico 3070
USB: serial: ftdi_sio: add ID for Hjelmslund Electronics USB485
staging: comedi: ni_660x: fix missing break in switch statement
staging: wilc1000: fix to set correct value for 'vif_num'
staging: android: ion: fix sys heap pool's gfp_flags
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
net: dsa: mv88e6xxx: handle unknown duplex modes gracefully in mv88e6xxx_port_set_duplex
net-sysfs: Fix mem leak in netdev_register_kobject
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
team: Free BPF filter when unregistering netdev
tipc: fix RDM/DGRAM connect() regression
bnxt_en: Drop oversize TX packets to prevent errors.
geneve: correctly handle ipv6.disable module parameter
hv_netvsc: Fix IP header checksum for coalesced packets
net: dsa: mv88e6xxx: Fix u64 statistics
netlabel: fix out-of-bounds memory accesses
net: netem: fix skb length BUG_ON in __skb_to_sgvec
net: nfc: Fix NULL dereference on nfc_llcp_build_tlv fails
net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
net: sit: fix memory leak in sit_init_net()
net: socket: set sock->sk to NULL after calling proto_ops::release()
xen-netback: don't populate the hash cache on XenBus disconnect
xen-netback: fix occasional leak of grant ref mappings under memory pressure
net: Add __icmp_send helper.
net: avoid use IPCB in cipso_v4_error
ipv4: Return error for RTA_VIA attribute
ipv6: Return error for RTA_VIA attribute
mpls: Return error for RTA_GATEWAY attribute
tun: fix blocking read
tun: remove unnecessary memory barrier
net: phy: Micrel KSZ8061: link failure after cable connect
net: dsa: mv88e6xxx: Fix statistics on mv88e6161
x86/CPU/AMD: Set the CPB bit unconditionally on F17h
applicom: Fix potential Spectre v1 vulnerabilities
MIPS: irq: Allocate accurate order pages for irq stack
hugetlbfs: fix races and page leaks during migration
xtensa: fix get_wchan
Bluetooth: Fix locking in bt_accept_enqueue() for BH context
exec: Fix mem leak in kernel_read_file
scsi: core: reset host byte in DID_NEXUS_FAILURE case
media: uvcvideo: Fix 'type' check leading to overflow
vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel
perf core: Fix perf_proc_update_handler() bug
perf tools: Handle TOPOLOGY headers with no CPU
IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
iommu/amd: Call free_iova_fast with pfn in map_sg
iommu/amd: Unmap all mapped pages in error path of map_sg
ipvs: Fix signed integer overflow when setsockopt timeout
iommu/amd: Fix IOMMU page flush when detach device from a domain
xtensa: SMP: fix ccount_timer_shutdown
selftests: cpu-hotplug: fix case where CPUs offline > CPUs present
xtensa: SMP: fix secondary CPU initialization
xtensa: smp_lx200_defconfig: fix vectors clash
xtensa: SMP: mark each possible CPU as present
xtensa: SMP: limit number of possible CPUs by NR_CPUS
net: altera_tse: fix msgdma_tx_completion on non-zero fill_level case
net: hns: Fix for missing of_node_put() after of_parse_phandle()
net: hns: Restart autoneg need return failed when autoneg off
net: hns: Fix wrong read accesses via Clause 45 MDIO protocol
net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup()
netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
gpio: vf610: Mask all GPIO interrupts
selftests: timers: use LDLIBS instead of LDFLAGS
nfs: Fix NULL pointer dereference of dev_name
qed: Fix bug in tx promiscuous mode settings
qed: Fix LACP pdu drops for VFs
qed: Fix VF probe failure while FLR
qed: Fix system crash in ll2 xmit
qed: Fix stack out of bounds bug
scsi: libfc: free skb when receiving invalid flogi resp
scsi: 53c700: pass correct "dev" to dma_alloc_attrs()
platform/x86: Fix unmet dependency warning for SAMSUNG_Q10
cifs: fix computation for MAX_SMB2_HDR_SIZE
x86/microcode/amd: Don't falsely trick the late loading mechanism
arm64: kprobe: Always blacklist the KVM world-switch code
apparmor: Fix aa_label_build() error handling for failed merges
x86/kexec: Don't setup EFI info if EFI runtime is not enabled
x86_64: increase stack size for KASAN_EXTRA
mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
lib/test_kmod.c: potential double free in error handling
fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
autofs: drop dentry reference only when it is never used
autofs: fix error return in autofs_fill_super()
ARM: dts: omap4-droid4: Fix typo in cpcap IRQ flags
arm64: dts: renesas: r8a7796: Enable DMA for SCIF2
soc: fsl: qbman: avoid race in clearing QMan interrupt
bpf: sock recvbuff must be limited by rmem_max in bpf_setsockopt()
ARM: pxa: ssp: unneeded to free devm_ allocated data
arm64: dts: add msm8996 compatible to gicv3
DTS: CI20: Fix bugs in ci20's device tree.
usb: phy: fix link errors
irqchip/mmp: Only touch the PJ4 IRQ & FIQ bits on enable/disable
net: stmmac: Fallback to Platform Data clock in Watchdog conversion
net: stmmac: Send TSO packets always from Queue 0
net: stmmac: Disable EEE mode earlier in XMIT callback
irqchip/gic-v3-its: Fix ITT_entry_size accessor
relay: check return of create_buf_file() properly
bpf, selftests: fix handling of sparse CPU allocations
bpf: fix lockdep false positive in percpu_freelist
drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
vsock/virtio: fix kernel panic after device hot-unplug
vsock/virtio: reset connected sockets on device removal
dmaengine: dmatest: Abort test in case of mapping error
selftests: netfilter: fix config fragment CONFIG_NF_TABLES_INET
selftests: netfilter: add simple masq/redirect test cases
netfilter: nf_nat: skip nat clash resolution for same-origin entries
s390/qeth: fix use-after-free in error path
perf symbols: Filter out hidden symbols from labels
perf trace: Support multiple "vfs_getname" probes
MIPS: Remove function size check in get_frame_info()
i2c: omap: Use noirq system sleep pm ops to idle device for suspend
fs: ratelimit __find_get_block_slow() failure message.
qed: Fix EQ full firmware assert.
qed: Consider TX tcs while deriving the max num_queues for PF.
Input: wacom_serial4 - add support for Wacom ArtPad II tablet
Input: elan_i2c - add id for touchpad found in Lenovo s21e-20
iscsi_ibft: Fix missing break in switch statement
scsi: aacraid: Fix missing break in switch statement
arm64: dts: hikey: Give wifi some time after power-on
ARM: dts: exynos: Fix pinctrl definition for eMMC RTSN line on Odroid X2/U3
ARM: dts: exynos: Add minimal clkout parameters to Exynos3250 PMU
drm: disable uncached DMA optimization for ARM and arm64
ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
gfs2: Fix missed wakeups in find_insert_glock
ath9k: Avoid OF no-EEPROM quirks without qca,no-eeprom
driver core: Postpone DMA tear-down until after devres release
perf/x86/intel: Make cpuc allocations consistent
perf/x86/intel: Generalize dynamic constraint creation
x86: Add TSX Force Abort CPUID/MSR
perf/x86/intel: Implement support for TSX Force Abort
Linux 4.14.106
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 625c85a62cb7d3c79f6e16de3cfa972033658250 upstream.
The cpufreq_global_kobject is created using kobject_create_and_add()
helper, which assigns the kobj_type as dynamic_kobj_ktype and show/store
routines are set to kobj_attr_show() and kobj_attr_store().
These routines pass struct kobj_attribute as an argument to the
show/store callbacks. But all the cpufreq files created using the
cpufreq_global_kobject expect the argument to be of type struct
attribute. Things work fine currently as no one accesses the "attr"
argument. We may not see issues even if the argument is used, as struct
kobj_attribute has struct attribute as its first element and so they
will both get same address.
But this is logically incorrect and we should rather use struct
kobj_attribute instead of struct global_attr in the cpufreq core and
drivers and the show/store callbacks should take struct kobj_attribute
as argument instead.
This bug is caught using CFI CLANG builds in android kernel which
catches mismatch in function prototypes for such callbacks.
Reported-by: Donghee Han <dh.han@samsung.com>
Reported-by: Sangkyu Kim <skwith.kim@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for CPUFREQ_INCOMPATIBLE event in policy notifiers
this event is added for thermal driver to make sure thermal
is the last client in applying the policy limits.
Change-Id: I392e4745957829dd923f1f40201ad5c6b8ccf006
Signed-off-by: Santosh Mardi <gsantosh@codeaurora.org>
Implements the Max Frequency Capping Engine (MFCE) getter function
topology_get_max_freq_scale() to provide the scheduler with a
maximum frequency scaling correction factor for more accurate cpu
capacity handling by being able to deal with max frequency capping.
This scaling factor describes the influence of running a cpu with a
current maximum frequency (policy) lower than the maximum possible
frequency (cpuinfo).
The factor is:
policy_max_freq(cpu) << SCHED_CAPACITY_SHIFT / cpuinfo_max_freq(cpu)
It also implements the MFCE setter function arch_set_max_freq_scale()
which is called from cpufreq_set_policy().
Change-Id: I59e52861ee260755ab0518fe1f7183a2e4e3d0fc
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Git-commit: b6a1f3e4dd
Git-repo: https://android.googlesource.com/kernel/common/
[satyap@codeaurora.org: trivial merge conflict resolution in
drivers/base/arch_topology.c where msm-4.14 has efficiency variable
which is not part of Android Common, so, the conflict and resolution]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Implements the Max Frequency Capping Engine (MFCE) getter function
topology_get_max_freq_scale() to provide the scheduler with a
maximum frequency scaling correction factor for more accurate cpu
capacity handling by being able to deal with max frequency capping.
This scaling factor describes the influence of running a cpu with a
current maximum frequency (policy) lower than the maximum possible
frequency (cpuinfo).
The factor is:
policy_max_freq(cpu) << SCHED_CAPACITY_SHIFT / cpuinfo_max_freq(cpu)
It also implements the MFCE setter function arch_set_max_freq_scale()
which is called from cpufreq_set_policy().
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Change-Id: I59e52861ee260755ab0518fe1f7183a2e4e3d0fc
Changes in 4.14.13
x86/mm: Set MODULES_END to 0xffffffffff000000
x86/mm: Map cpu_entry_area at the same place on 4/5 level
x86/kaslr: Fix the vaddr_end mess
x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers
x86/tlb: Drop the _GPL from the cpu_tlbstate export
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
kernel/acct.c: fix the acct->needcheck check in check_free_space()
mm/mprotect: add a cond_resched() inside change_pmd_range()
mm/sparse.c: wrong allocation for mem_section
userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails
btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
efi/capsule-loader: Reinstate virtual capsule mapping
crypto: n2 - cure use after free
crypto: chacha20poly1305 - validate the digest size
crypto: pcrypt - fix freeing pcrypt instances
crypto: chelsio - select CRYPTO_GF128MUL
drm/i915: Disable DC states around GMBUS on GLK
drm/i915: Apply Display WA #1183 on skl, kbl, and cfl
sunxi-rsb: Include OF based modalias in device uevent
fscache: Fix the default for fscache_maybe_release_page()
x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu()
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
iommu/arm-smmu-v3: Don't free page table ops twice
iommu/arm-smmu-v3: Cope with duplicated Stream IDs
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR
Input: elantech - add new icbody type 15
x86/microcode/AMD: Add support for fam17h microcode loading
apparmor: fix regression in mount mediation when feature set is pinned
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
parisc: qemu idle sleep support
mtd: nand: pxa3xx: Fix READOOB implementation
KVM: s390: fix cmma migration for multiple memory slots
KVM: s390: prevent buffer overrun on memory hotplug during migration
Linux 4.14.13
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream.
After commit 890da9cf09 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration. That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".
However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.
Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.
Fixes: 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Frequency-invariant accounting support based on the ratio of current
frequency and maximum supported frequency is an optional feature an arch
can implement.
Since there are cpufreq drivers (e.g. cpufreq-dt) which can be build for
different arch's a default implementation of the frequency-invariance
setter function arch_set_freq_scale() is needed.
This default implementation is an empty weak function which will be
overwritten by a strong function in case the arch provides one.
The setter function passes the cpumask of related (to the frequency
change) cpus (online and offline cpus), the (new) current frequency and
the maximum supported frequency.
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Change-Id: I912d5815ee29e1171c498e638d1a089c5a598add
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
* pm-cpufreq-sched:
cpufreq: schedutil: Always process remote callback with slow switching
cpufreq: schedutil: Don't restrict kthread to related_cpus unnecessarily
cpufreq: Return 0 from ->fast_switch() on errors
cpufreq: Simplify cpufreq_can_do_remote_dvfs()
cpufreq: Process remote callbacks from any CPU if the platform permits
sched: cpufreq: Allow remote cpufreq callbacks
cpufreq: schedutil: Use unsigned int for iowait boost
cpufreq: schedutil: Make iowait boost more energy efficient
On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
from policy-A can change frequency of CPUs belonging to policy-B.
This is quite common in case of ARM platforms where we don't
configure any per-cpu register.
Add a flag to identify such platforms and update
cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
set.
Also enable the flag for cpufreq-dt driver which is used only on ARM
platforms currently.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.
One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.
This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.
The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.
The intel_pstate driver is updated to always reject remote callbacks.
This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.
The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:
- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
next tick.
Based on initial work from Steve Muckle.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The policy->transition_latency field is used for multiple purposes
today and its not straight forward at all. This is how it is used:
A. Set the correct transition_latency value.
B. Set it to CPUFREQ_ETERNAL because:
1. We don't want automatic dynamic switching (with
ondemand/conservative) to happen at all.
2. We don't know the transition latency.
This patch handles the B.1. case in a more readable way. A new flag for
the cpufreq drivers is added to disallow use of cpufreq governors which
have dynamic_switching flag set.
All the current cpufreq drivers which are setting transition_latency
unconditionally to CPUFREQ_ETERNAL are updated to use it. They don't
need to set transition_latency anymore.
There shouldn't be any functional change after this patch.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no limitation in the ondemand or conservative governors which
disallow the transition_latency to be greater than 10 ms.
The max_transition_latency field is rather used to disallow automatic
dynamic frequency switching for platforms which didn't wanted these
governors to run.
Replace max_transition_latency with a boolean (dynamic_switching) and
check for transition_latency == CPUFREQ_ETERNAL along with that. This
makes it pretty straight forward to read/understand now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The policy->transition_delay_us field is used only by the schedutil
governor currently, and this field describes how fast the driver wants
the cpufreq governor to change CPUs frequency. It should rather be a
common thing across all governors, as it doesn't have any schedutil
dependency here.
Create a new helper cpufreq_policy_transition_delay_us() to get the
transition delay across all governors.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpufreq core and governors aren't supposed to set a limit on how
fast we want to try changing the frequency. This is currently done for
the legacy governors with help of min_sampling_rate.
At worst, we may end up setting the sampling rate to a value lower than
the rate at which frequency can be changed and then one of the CPUs in
the policy will be only changing frequency for ever.
But that is something for the user to decide and there is no need to
have special handling for such cases in the core. Leave it for the user
to figure out.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull thermal management updates from Zhang Rui:
- Improve thermal cpu_cooling interaction with cpufreq core.
The cpu_cooling driver is designed to use CPU frequency scaling to
avoid high thermal states for a platform. But it wasn't glued really
well with cpufreq core.
For example clipped-cpus is copied from the policy structure and its
much better to use the policy->cpus (or related_cpus) fields directly
as they may have got updated. Not that things were broken before this
series, but they can be optimized a bit more.
This series tries to improve interactions between cpufreq core and
cpu_cooling driver and does some fixes/cleanups to the cpu_cooling
driver. (Viresh Kumar)
- A couple of fixes and cleanups in thermal core and imx, hisilicon,
bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter,
Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits)
thermal: bcm2835: fix an error code in probe()
thermal: hisilicon: Handle return value of clk_prepare_enable
thermal: imx: Handle return value of clk_prepare_enable
thermal: int340x: check for sensor when PTYP is missing
Thermal/int340x: Fix few typos and kernel-doc style
thermal: fix source code documentation for parameters
thermal: cpu_cooling: Replace kmalloc with kmalloc_array
thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
thermal: cpu_cooling: get_level() can't fail
thermal: cpu_cooling: create structure for idle time stats
thermal: cpu_cooling: merge frequency and power tables
thermal: cpu_cooling: get rid of 'allowed_cpus'
thermal: cpu_cooling: OPPs are registered for all CPUs
thermal: cpu_cooling: store cpufreq policy
cpufreq: create cpufreq_table_count_valid_entries()
thermal: cpu_cooling: use cpufreq_policy to register cooling device
thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
thermal: cpu_cooling: remove cpufreq_cooling_get_level()
...
The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.
Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.
Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.
MHz is computed like so:
MHz = base_MHz * delta_APERF / delta_MPERF
MHz is the average frequency of the busy processor
over a measurement interval. The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.
As with previous methods of calculating MHz,
idle time is excluded.
base_MHz above is from TSC calibration global "cpu_khz".
This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.
When this routine is invoked more frequently, the measurement
interval becomes shorter. However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.
Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.
Signed-off-by: Len Brown <len.brown@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).
That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.
Make intel_pstate set transition_delay_us to 500.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Those were added by:
commit fcd7af917a ("cpufreq: stats: handle cpufreq_unregister_driver()
and suspend/resume properly")
but aren't used anymore since:
commit 1aefc75b24 ("cpufreq: stats: Make the stats code non-modular").
Remove them. Also remove the redundant parameter to the respective
routines.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cpufreq governors may need to know what a particular target frequency
maps to in the driver without necessarily wanting to set the frequency.
Support this operation via a new cpufreq API,
cpufreq_driver_resolve_freq(). This API returns the lowest driver
frequency equal or greater than the target frequency
(CPUFREQ_RELATION_L), subject to any policy (min/max) or driver
limitations. The mapping is also cached in the policy so that a
subsequent fast_switch operation can avoid repeating the same lookup.
The API will call a new cpufreq driver callback, resolve_freq(), if it
has been registered by the driver. Otherwise the frequency is resolved
via cpufreq_frequency_table_target(). Rather than require ->target()
style drivers to provide a resolve_freq() callback it is left to the
caller to ensure that the driver implements this callback if necessary
to use cpufreq_driver_resolve_freq().
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq drivers aren't required to provide a sorted frequency table
today, and even the ones which provide a sorted table aren't handled
efficiently by cpufreq core.
This patch adds infrastructure to verify if the freq-table provided by
the drivers is sorted or not, and use efficient helpers if they are
sorted.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This routine can't fail unless the frequency table is invalid and
doesn't contain any valid entries.
Make it return the index and WARN() in case it is used for an invalid
table.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of the callers of cpufreq_frequency_get_table() already have the
pointer to a valid 'policy' structure and they don't really need to go
through the per-cpu variable first and then a check to validate the
frequency, in order to find the freq-table for the policy.
Directly use the policy->freq_table field instead for them.
Only one user of that API is left after above changes, cpu_cooling.c and
it accesses the freq_table in a racy way as the policy can get freed in
between.
Fix it by using cpufreq_cpu_get() properly.
Since there are no more users of cpufreq_frequency_get_table() left, get
rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The modularity of cpufreq_stats is quite problematic.
First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.
Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.
On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value. Arguably, there's not much reason
for that code to be modular at all.
For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.
Make the stats sysfs attributes appear empty if fast frequency
switching is enabled as the statistics will not be updated in that
case anyway (and returning -EBUSY from those attributes breaks
powertop).
While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The 'initialized' field in struct cpufreq_governor is only used by
the conservative governor (as a usage counter) and the way that
happens is far from straightforward and arguably incorrect.
Namely, the value of 'initialized' is checked by
cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and
the results of those checks are passed (as the second argument) to
the ->init() and ->exit() callbacks in struct dbs_governor. Those
callbacks are only implemented by the ondemand and conservative
governors and ondemand doesn't use their second argument at all.
In turn, the conservative governor uses it to decide whether or not
to either register or unregister a transition notifier.
That whole mechanism is not only unnecessarily convoluted, but also
racy, because the 'initialized' field of struct cpufreq_governor is
updated in cpufreq_init_governor() and cpufreq_exit_governor() under
policy->rwsem which doesn't help if one of these functions is run
twice in parallel for different policies (which isn't impossible in
principle), for example.
Instead of it, add a proper usage counter to the conservative
governor and update it from cs_init() and cs_exit() which is
guaranteed to be non-racy, as those functions are only called
under gov_dbs_data_mutex which is global.
With that in place, drop the 'initialized' field from struct
cpufreq_governor as it is not used any more.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>