Changes in 4.19.86
spi: mediatek: use correct mata->xfer_len when in fifo transfer
i2c: mediatek: modify threshold passed to i2c_get_dma_safe_msg_buf()
tee: optee: add missing of_node_put after of_device_is_available
Revert "OPP: Protect dev_list with opp_table lock"
net: cdc_ncm: Signedness bug in cdc_ncm_set_dgram_size()
idr: Fix idr_get_next race with idr_remove
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
mm/memory_hotplug: fix updating the node span
arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
fbdev: Ditch fb_edid_add_monspecs
bpf, x32: Fix bug for BPF_ALU64 | BPF_NEG
bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0
bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0
bpf, x32: Fix bug for BPF_JMP | {BPF_JSGT, BPF_JSLE, BPF_JSLT, BPF_JSGE}
net: ovs: fix return type of ndo_start_xmit function
net: xen-netback: fix return type of ndo_start_xmit function
ARM: dts: dra7: Enable workaround for errata i870 in PCIe host mode
ARM: dts: omap5: enable OTG role for DWC3 controller
net: hns3: Fix for netdev not up problem when setting mtu
net: hns3: Fix loss of coal configuration while doing reset
f2fs: return correct errno in f2fs_gc
ARM: dts: sun8i: h3-h5: ir register size should be the whole memory block
ARM: dts: sun8i: h3: bpi-m2-plus: Fix address for external RGMII Ethernet PHY
tcp: up initial rmem to 128KB and SYN rwin to around 64KB
SUNRPC: Fix priority queue fairness
ACPI / LPSS: Make acpi_lpss_find_device() also find PCI devices
ACPI / LPSS: Resume BYT/CHT I2C controllers from resume_noirq
f2fs: keep lazytime on remount
IB/hfi1: Error path MAD response size is incorrect
IB/hfi1: Ensure ucast_dlid access doesnt exceed bounds
mt76x2: fix tx power configuration for VHT mcs 9
mt76x2: disable WLAN core before probe
mt76: fix handling ps-poll frames
iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout
kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
arm64/numa: Report correct memblock range for the dummy node
ath10k: fix vdev-start timeout on error
rtlwifi: btcoex: Use proper enumerated types for Wi-Fi only interface
ata: ahci_brcm: Allow using driver or DSL SoCs
PM / devfreq: Fix devfreq_add_device() when drivers are built as modules.
PM / devfreq: Fix handling of min/max_freq == 0
PM / devfreq: stopping the governor before device_unregister()
ath9k: fix reporting calculated new FFT upper max
selftests/tls: Fix recv(MSG_PEEK) & splice() test cases
usb: gadget: udc: fotg210-udc: Fix a sleep-in-atomic-context bug in fotg210_get_status()
usb: dwc3: gadget: Check ENBLSLPM before sending ep command
nl80211: Fix a GET_KEY reply attribute
irqchip/irq-mvebu-icu: Fix wrong private data retrieval
watchdog: core: fix null pointer dereference when releasing cdev
watchdog: renesas_wdt: stop when unregistering
watchdog: sama5d4: fix timeout-sec usage
watchdog: w83627hf_wdt: Support NCT6796D, NCT6797D, NCT6798D
KVM: PPC: Inform the userspace about TCE update failures
printk: Do not miss new messages when replaying the log
printk: CON_PRINTBUFFER console registration is a bit racy
dmaengine: ep93xx: Return proper enum in ep93xx_dma_chan_direction
dmaengine: timb_dma: Use proper enum in td_prep_slave_sg
ALSA: hda: Fix mismatch for register mask and value in ext controller.
ext4: fix build error when DX_DEBUG is defined
clk: keystone: Enable TISCI clocks if K3_ARCH
sunrpc: Fix connect metrics
x86/PCI: Apply VMD's AERSID fixup generically
mei: samples: fix a signedness bug in amt_host_if_call()
cxgb4: Use proper enum in cxgb4_dcb_handle_fw_update
cxgb4: Use proper enum in IEEE_FAUX_SYNC
powerpc/pseries: Fix DTL buffer registration
powerpc/pseries: Fix how we iterate over the DTL entries
powerpc/xive: Move a dereference below a NULL test
ARM: dts: at91: sama5d4_xplained: fix addressable nand flash size
ARM: dts: at91: at91sam9x5cm: fix addressable nand flash size
ARM: dts: at91: sama5d2_ptc_ek: fix bootloader env offsets
mtd: rawnand: sh_flctl: Use proper enum for flctl_dma_fifo0_transfer
PM / hibernate: Check the success of generating md5 digest before hibernation
tools: PCI: Fix compilation warnings
clocksource/drivers/sh_cmt: Fixup for 64-bit machines
clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
ice: Fix forward to queue group logic
md: allow metadata updates while suspending an array - fix
ixgbe: Fix ixgbe TX hangs with XDP_TX beyond queue limit
i40e: Use proper enum in i40e_ndo_set_vf_link_state
ixgbe: Fix crash with VFs and flow director on interface flap
IB/mthca: Fix error return code in __mthca_init_one()
IB/rxe: avoid srq memory leak
RDMA/hns: Bugfix for reserved qp number
RDMA/hns: Submit bad wr when post send wr exception
RDMA/hns: Bugfix for CM test
RDMA/hns: Limit the size of extend sge of sq
IB/mlx4: Avoid implicit enumerated type conversion
rpmsg: glink: smem: Support rx peak for size less than 4 bytes
msm/gpu/a6xx: Force of_dma_configure to setup DMA for GMU
OPP: Return error on error from dev_pm_opp_get_opp_count()
ACPICA: Never run _REG on system_memory and system_IO
cpuidle: menu: Fix wakeup statistics updates for polling state
ASoC: qdsp6: q6asm-dai: checking NULL vs IS_ERR()
powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer
powerpc/64s/radix: Explicitly flush ERAT with local LPID invalidation
ata: ep93xx: Use proper enums for directions
qed: Avoid implicit enum conversion in qed_ooo_submit_tx_buffers
media: rc: ir-rc6-decoder: enable toggle bit for Kathrein RCU-676 remote
media: pxa_camera: Fix check for pdev->dev.of_node
media: rcar-vin: fix redeclaration of symbol
media: i2c: adv748x: Support probing a single output
ALSA: hda/sigmatel - Disable automute for Elo VuPoint
bnxt_en: return proper error when FW returns HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED
KVM: PPC: Book3S PR: Exiting split hack mode needs to fixup both PC and LR
USB: serial: cypress_m8: fix interrupt-out transfer length
usb: dwc2: disable power_down on rockchip devices
mtd: physmap_of: Release resources on error
cpu/SMT: State SMT is disabled even with nosmt and without "=force"
brcmfmac: reduce timeout for action frame scan
brcmfmac: fix full timeout waiting for action frame on-channel tx
qtnfmac: request userspace to do OBSS scanning if FW can not
qtnfmac: pass sgi rate info flag to wireless core
qtnfmac: inform wireless core about supported extended capabilities
qtnfmac: drop error reports for out-of-bounds key indexes
clk: samsung: Use NOIRQ stage for Exynos5433 clocks suspend/resume
clk: samsung: exynos5420: Define CLK_SECKEY gate clock only or Exynos5420
clk: samsung: Use clk_hw API for calling clk framework from clk notifiers
i2c: brcmstb: Allow enabling the driver on DSL SoCs
printk: Correct wrong casting
NFSv4.x: fix lock recovery during delegation recall
dmaengine: ioat: fix prototype of ioat_enumerate_channels
media: ov5640: fix framerate update
media: cec-gpio: select correct Signal Free Time
gfs2: slow the deluge of io error messages
i2c: omap: use core to detect 'no zero length' quirk
i2c: qup: use core to detect 'no zero length' quirk
i2c: tegra: use core to detect 'no zero length' quirk
i2c: zx2967: use core to detect 'no zero length' quirk
Input: st1232 - set INPUT_PROP_DIRECT property
Input: silead - try firmware reload after unsuccessful resume
soc: fsl: bman_portals: defer probe after bman's probe
net: hns3: Fix for rx vlan id handle to support Rev 0x21 hardware
tc-testing: fix build of eBPF programs
remoteproc: Check for NULL firmwares in sysfs interface
remoteproc: qcom: q6v5: Fix a race condition on fatal crash
kexec: Allocate decrypted control pages for kdump if SME is enabled
x86/olpc: Fix build error with CONFIG_MFD_CS5535=m
dmaengine: rcar-dmac: set scatter/gather max segment size
crypto: mxs-dcp - Fix SHA null hashes and output length
crypto: mxs-dcp - Fix AES issues
xfrm: use correct size to initialise sp->ovec
ACPI / SBS: Fix rare oops when removing modules
iwlwifi: mvm: don't send keys when entering D3
xsk: proper AF_XDP socket teardown ordering
x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately
mmc: renesas_sdhi_internal_dmac: Whitelist r8a774a1
mmc: tmio: Fix SCC error detection
mmc: renesas_sdhi_internal_dmac: set scatter/gather max segment size
atmel_lcdfb: support native-mode display-timings
fbdev: sbuslib: use checked version of put_user()
fbdev: sbuslib: integer overflow in sbusfb_ioctl_helper()
fbdev: fix broken menu dependencies
reset: Fix potential use-after-free in __of_reset_control_get()
bcache: account size of buckets used in uuid write to ca->meta_sectors_written
bcache: recal cached_dev_sectors on detach
platform/x86: mlx-platform: Properly use mlxplat_mlxcpld_msn201x_items
media: dw9714: Fix error handling in probe function
media: dw9807-vcm: Fix probe error handling
media: cx18: Don't check for address of video_dev
mtd: spi-nor: cadence-quadspi: Use proper enum for dma_[un]map_single
mtd: devices: m25p80: Make sure WRITE_EN is issued before each write
x86/intel_rdt: Introduce utility to obtain CDP peer
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
mmc: mmci: expand startbiterr to irqmask and error check
s390/kasan: avoid vdso instrumentation
s390/kasan: avoid instrumentation of early C code
s390/kasan: avoid user access code instrumentation
proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()
backlight: lm3639: Unconditionally call led_classdev_unregister
mfd: ti_am335x_tscadc: Keep ADC interface on if child is wakeup capable
printk: Give error on attempt to set log buffer length to over 2G
media: isif: fix a NULL pointer dereference bug
GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
media: cx231xx: fix potential sign-extension overflow on large shift
media: venus: vdec: fix decoded data size
ALSA: hda/ca0132 - Fix input effect controls for desktop cards
lightnvm: pblk: fix rqd.error return value in pblk_blk_erase_sync
lightnvm: pblk: fix incorrect min_write_pgs
lightnvm: pblk: guarantee emeta on line close
lightnvm: pblk: fix write amplificiation calculation
lightnvm: pblk: guarantee mw_cunits on read buffer
lightnvm: do no update csecs and sos on 1.2
lightnvm: pblk: fix error handling of pblk_lines_init()
lightnvm: pblk: consider max hw sectors supported for max_write_pgs
x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
bpf: btf: Fix a missing check bug
net: fix generic XDP to handle if eth header was mangled
gpio: syscon: Fix possible NULL ptr usage
spi: fsl-lpspi: Prevent FIFO under/overrun by default
pinctrl: gemini: Mask and set properly
spi: spidev: Fix OF tree warning logic
ARM: 8802/1: Call syscall_trace_exit even when system call skipped
x86/mm: Do not warn about PCI BIOS W+X mappings
orangefs: rate limit the client not running info message
pinctrl: gemini: Fix up TVC clock group
scsi: arcmsr: clean up clang warning on extraneous parentheses
hwmon: (k10temp) Support all Family 15h Model 6xh and Model 7xh processors
hwmon: (nct6775) Fix names of DIMM temperature sources
hwmon: (pwm-fan) Silence error on probe deferral
hwmon: (ina3221) Fix INA3221_CONFIG_MODE macros
hwmon: (npcm-750-pwm-fan) Change initial pwm target to 255
selftests: forwarding: Have lldpad_app_wait_set() wait for unknown, too
net: sched: avoid writing on noop_qdisc
netfilter: nft_compat: do not dump private area
misc: cxl: Fix possible null pointer dereference
mac80211: minstrel: fix using short preamble CCK rates on HT clients
mac80211: minstrel: fix CCK rate group streams value
mac80211: minstrel: fix sampling/reporting of CCK rates in HT mode
spi: rockchip: initialize dma_slave_config properly
mlxsw: spectrum_switchdev: Check notification relevance based on upper device
ARM: dts: omap5: Fix dual-role mode on Super-Speed port
tcp: start receiver buffer autotuning sooner
ACPI / LPSS: Use acpi_lpss_* instead of acpi_subsys_* functions for hibernate
PM / devfreq: Fix static checker warning in try_then_request_governor
tools: PCI: Fix broken pcitest compilation
powerpc/time: Fix clockevent_decrementer initalisation for PR KVM
mmc: tmio: fix SCC error handling to avoid false positive CRC error
x86/resctrl: Fix rdt_find_domain() return value and checks
Linux 4.19.86
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I40e00f0b53336aba2edc8ed6a696fdf5206fdba7
[ Upstream commit 5f26bdceb9c0a5e6c696aa2899d077cd3ae93413 ]
If the CPU exits the "polling" state due to the time limit in the
loop in poll_idle(), this is not a real wakeup and it just means
that the "polling" state selection was not adequate. The governor
mispredicted short idle duration, but had a more suitable state been
selected, the CPU might have spent more time in it. In fact, there
is no reason to expect that there would have been a wakeup event
earlier than the next timer in that case.
Handling such cases as regular wakeups in menu_update() may cause the
menu governor to make suboptimal decisions going forward, but ignoring
them altogether would not be correct either, because every time
menu_select() is invoked, it makes a separate new attempt to predict
the idle duration taking distinct time to the closest timer event as
input and the outcomes of all those attempts should be recorded.
For this reason, make menu_update() always assume that if the
"polling" state was exited due to the time limit, the next proper
wakeup event for the CPU would be the next timer event (not
including the tick).
Fixes: a37b969a61 "cpuidle: poll_state: Add time limit to poll_idle()"
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The idle-state of each cpu is currently pointed to by rq->idle_state but
there isn't any information in the struct cpuidle_state that can used to
look up the idle-state energy model data stored in struct
sched_group_energy. For this purpose is necessary to store the idle
state index as well. Ideally, the idle-state data should be unified.
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Change-Id: Ib3d1178512735b0e314881f73fb8ccff5a69319f
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
(cherry picked from commit a732c97420e109956c20f34c70b91e6d06f5df31)
[ Fixed trivial cherry-pick conflict in kernel/sched/sched.h ]
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
There is some code duplication related to the PM QoS handling between
the existing cpuidle governors, so move that code to a common helper
function and call that from the governors.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.
Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
(1) the idle exit latency is constrained at 0, or
(2) the selected state is a polling one, or
(3) the expected idle period duration is within the tick period
range.
In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values. To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs. Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.
Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Add a new attribute group called "s2idle" under the sysfs directory
of each cpuidle state that supports the ->enter_s2idle callback
and put two new attributes, "usage" and "time", into that group to
represent the number of times the given state was requested for
suspend-to-idle and the total time spent in suspend-to-idle after
requesting that state, respectively.
That will allow diagnostic information related to suspend-to-idle
to be collected without enabling advanced debug features and
analyzing dmesg output.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit f859422075 (x86: PM: Make APM idle driver initialize polling
state) made apm_init() call cpuidle_poll_state_init(), but that only
is defined for CONFIG_CPU_IDLE set, so make the empty stub of it
available for CONFIG_CPU_IDLE unset too to fix the resulting build
issue.
Fixes: f859422075 (x86: PM: Make APM idle driver initialize polling state)
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a CPU is entering a low power idle state where it doesn't lose any
context, then there is no need to call cpu_pm_enter()/cpu_pm_exit().
Add a new macro(CPU_PM_CPU_IDLE_ENTER_RETENTION) to be used by cpuidle
drivers when they are entering retention state. By not calling
cpu_pm_enter and cpu_pm_exit we reduce the latency involved in
entering and exiting the retention idle states.
CPU_PM_CPU_IDLE_ENTER_RETENTION assumes that no state is lost and
hence CPU PM notifiers will not be called. We may need a broader
change if we need to support partial retention states effeciently.
On ARM64 based Qualcomm Server Platform we measured below overhead for
for calling cpu_pm_enter and cpu_pm_exit for retention states.
workload: stress --hdd #CPUs --hdd-bytes 32M -t 30
Average overhead of cpu_pm_enter - 1.2us
Average overhead of cpu_pm_exit - 3.1us
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* pm-sleep:
ACPI / PM: Check low power idle constraints for debug only
PM / s2idle: Rename platform operations structure
PM / s2idle: Rename ->enter_freeze to ->enter_s2idle
PM / s2idle: Rename freeze_state enum and related items
PM / s2idle: Rename PM_SUSPEND_FREEZE to PM_SUSPEND_TO_IDLE
ACPI / PM: Prefer suspend-to-idle over S3 on some systems
platform/x86: intel-hid: Wake up Dell Latitude 7275 from suspend-to-idle
PM / suspend: Define pr_fmt() in suspend.c
PM / suspend: Use mem_sleep_labels[] strings in messages
PM / sleep: Put pm_test under CONFIG_PM_SLEEP_DEBUG
PM / sleep: Check pm_wakeup_pending() in __device_suspend_noirq()
PM / core: Add error argument to dpm_show_time()
PM / core: Split dpm_suspend_noirq() and dpm_resume_noirq()
PM / s2idle: Rearrange the main suspend-to-idle loop
PM / timekeeping: Print debug messages when requested
PM / sleep: Mark suspend/hibernation start and finish
PM / sleep: Do not print debug messages by default
PM / suspend: Export pm_suspend_target_state
Make the drivers that want to include the polling state into their
states table initialize it explicitly and drop the initialization of
it (which in fact is conditional, but that is not obvious from the
code) from the core.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Move the polling state initialization code to a separate file built
conditionally on CONFIG_ARCH_HAS_CPU_RELAX to get rid of the #ifdef
in driver.c.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
On some architectures the first (index 0) idle state is a polling
one and it doesn't really save energy, so there is the
CPUIDLE_DRIVER_STATE_START symbol allowing some pieces of
cpuidle code to avoid using that state.
However, this makes the code rather hard to follow. It is better
to explicitly avoid the polling state, so add a new cpuidle state
flag CPUIDLE_FLAG_POLLING to mark it and make the relevant code
check that flag for the first state instead of using the
CPUIDLE_DRIVER_STATE_START symbol.
In the ACPI processor driver that cannot always rely on the state
flags (like before the states table has been set up) define
a new internal symbol ACPI_IDLE_STATE_START equivalent to the
CPUIDLE_DRIVER_STATE_START one and drop the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Rename the ->enter_freeze cpuidle driver callback to ->enter_s2idle
to make it clear that it is used for entering suspend-to-idle and
rename the related functions, variables and so on accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the current code for powernv_add_idle_states, there is a lot of code
duplication while initializing an idle state in powernv_states table.
Add an inline helper function to populate the powernv_states[] table
for a given idle state. Invoke this for populating the "Nap",
"Fastsleep" and the stop states in powernv_add_idle_states.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When idle injection is used to cap power, we need to override the
governor's choice of idle states.
For this reason, make it possible the deepest idle state selection to
be enforced by setting a flag on a given CPU to achieve the maximum
potential power draw reduction.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The governor's code use try_module_get() and put_module() to refcount
the governor's module. But the governors are not compiled as module.
The refcount does not prevent to switch the governor or unload
a module as they aren't compiled as modules. The code is pointless,
so remove it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function arm_enter_idle_state is exactly the same in both generic
ARM{32,64} CPUIdle driver and will be the same even on ARM64 backend
for ACPI processor idle driver. So we can unify it and move it to a
common place by introducing CPU_PM_CPU_IDLE_ENTER macro that can be
used in all places avoiding duplication.
This is in preparation of reuse of the generic cpuidle entry function
for ACPI LPI support on ARM64.
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is
enabled. Commit c8cc7d4de7 ("sched/idle: Reorganize the idle loop")
removed the #ifdef CONFIG_CPU_IDLE around cpuidle_idle_call() with the
compiler optimising away __this_cpu_read(cpuidle_devices). However, with
CONFIG_UBSAN && !CONFIG_CPU_IDLE, this optimisation no longer happens
and the kernel fails to link since cpuidle_devices is not defined.
This patch introduces an accessor function for the current CPU cpuidle
device (returning NULL when !CONFIG_CPU_IDLE) and uses it in
cpuidle_idle_call().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpuidle_device::safe_state_index need to be initialized before
use, it should be the same as cpuidle_driver::safe_state_index.
We tackled this issue by removing the safe_state_index from the
cpuidle_device structure and use the one in the cpuidle_driver
structure instead.
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-sleep:
PM / sleep: trace_device_pm_callback coverage in dpm_prepare/complete
PM / wakeup: add a dummy wakeup_source to record statistics
PM / sleep: Make suspend-to-idle-specific code depend on CONFIG_SUSPEND
PM / sleep: Return -EBUSY from suspend_enter() on wakeup detection
PM / tick: Add tracepoints for suspend-to-idle diagnostics
PM / sleep: Fix symbol name in a comment in kernel/power/main.c
leds / PM: fix hibernation on arm when gpio-led used with CPU led trigger
ARM: omap-device: use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS
bus: omap_l3_noc: add missed callbacks for suspend-to-disk
PM / sleep: Add macro to define common noirq system PM callbacks
PM / sleep: Refine diagnostic messages in enter_state()
PM / wakeup: validate wakeup source before activating it.
* pm-runtime:
PM / Runtime: Update last_busy in rpm_resume
PM / runtime: add note about re-calling in during device probe()
Since idle_should_freeze() is defined to always return 'false'
for CONFIG_SUSPEND unset, all of the code depending on it in
cpuidle_idle_call() is not necessary in that case.
Make that code depend on CONFIG_SUSPEND too to avoid building it
when it is not going to be used.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
The check of the cpuidle_enter() return value against -EBUSY
made in call_cpuidle() will not be necessary any more if
cpuidle_enter_state() calls default_idle_call() directly when it
is about to return -EBUSY, so make that happen and eliminate the
check.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Introduce a wrapper function around idle_set_state() called
sched_idle_set_state() that will pass this_rq() to it as the
first argument and make cpuidle_enter_state() call the new
function before and after entering the target state.
At the same time, remove direct invocations of idle_set_state()
from call_cpuidle().
This will allow the invocation of default_idle_call() to be
moved from call_cpuidle() to cpuidle_enter_state() safely
and call_cpuidle() to be simplified a bit as a result.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Thomas Schlichter reports the following issue on his Samsung NC20:
"The C-states C1 and C2 to the OS when connected to AC, and additionally
provides the C3 C-state when disconnected from AC. However, the number
of C-states shown in sysfs is fixed to the number of C-states present
at boot.
If I boot with AC connected, I always only see the C-states up to C2
even if I disconnect AC.
The reason is commit 130a5f6924 (ACPI / cpuidle: remove dev->state_count
setting). It removes the update of dev->state_count, but sysfs uses
exactly this variable to show the C-states.
The fix is to use drv->state_count in sysfs. As this is currently the
last user of dev->state_count, this variable can be completely removed."
Remove dev->state_count as per the above.
Reported-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 3810631332 (PM / sleep: Re-implement suspend-to-idle handling)
overlooked the fact that entering some sufficiently deep idle states
by CPUs may cause their local timers to stop and in those cases it
is necessary to switch over to a broadcast timer prior to entering
the idle state. If the cpuidle driver in use does not provide
the new ->enter_freeze callback for any of the idle states, that
problem affects suspend-to-idle too, but it is not taken into account
after the changes made by commit 3810631332.
Fix that by changing the definition of cpuidle_enter_freeze() and
re-arranging of the code in cpuidle_idle_call(), so the former does
not call cpuidle_enter() any more and the fallback case is handled
by cpuidle_idle_call() directly.
Fixes: 3810631332 (PM / sleep: Re-implement suspend-to-idle handling)
Reported-and-tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.
However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion. A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.
The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping. That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs. It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.
Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit. Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.
To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
In preparation for adding support for quiescing timers in the final
stage of suspend-to-idle transitions, rework the freeze_enter()
function making the system wait on a wakeup event, the freeze_wake()
function terminating the suspend-to-idle loop and the mechanism by
which deep idle states are entered during suspend-to-idle.
First of all, introduce a simple state machine for suspend-to-idle
and make the code in question use it.
Second, prevent freeze_enter() from losing wakeup events due to race
conditions and ensure that the number of online CPUs won't change
while it is being executed. In addition to that, make it force
all of the CPUs re-enter the idle loop in case they are in idle
states already (so they can enter deeper idle states if possible).
Next, drop cpuidle_use_deepest_state() and replace use_deepest_state
checks in cpuidle_select() and cpuidle_reflect() with a single
suspend-to-idle state check in cpuidle_idle_call().
Finally, introduce cpuidle_enter_freeze() that will simply find the
deepest idle state available to the given CPU and enter it using
cpuidle_enter().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CPUIDLE_FLAG_TIME_INVALID is no longer checked
by menu or ladder cpuidle governors, so don't
bother setting or defining it.
It was originally invented to account for the fact that
acpi_safe_halt() enables interrupts to invoke HLT.
That would allow interrupt service routines to be included
in the last_idle duration measurements made in cpuidle_enter_state(),
potentially returning a duration much larger than reality.
But menu and ladder can gracefully handle erroneously large duration
intervals without checking for CPUIDLE_FLAG_TIME_INVALID.
Further, if they don't check CPUIDLE_FLAG_TIME_INVALID, they
can also benefit from the instances when the duration interval
is not erroneously large.
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The only place where the time is invalid is when the ACPI_CSTATE_FFH entry
method is not set. Otherwise for all the drivers, the time can be correctly
measured.
Instead of duplicating the CPUIDLE_FLAG_TIME_VALID flag in all the drivers
for all the states, just invert the logic by replacing it by the flag
CPUIDLE_FLAG_TIME_INVALID, hence we can set this flag only for the acpi idle
driver, remove the former flag from all the drivers and invert the logic with
this flag in the different governor.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull MIPS updates from Ralf Baechle:
- three fixes for 3.15 that didn't make it in time
- limited Octeon 3 support.
- paravirtualization support
- improvment to platform support for Netlogix SOCs.
- add support for powering down the Malta eval board in software
- add many instructions to the in-kernel microassembler.
- add support for the BPF JIT.
- minor cleanups of the BCM47xx code.
- large cleanup of math emu code resulting in significant code size
reduction, better readability of the code and more accurate
emulation.
- improvments to the MIPS CPS code.
- support C3 power status for the R4k count/compare clock device.
- improvments to the GIO support for older SGI workstations.
- increase number of supported CPUs to 256; this can be reached on
certain embedded multithreaded ccNUMA configurations.
- various small cleanups, updates and fixes
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (173 commits)
MIPS: IP22/IP28: Improve GIO support
MIPS: Octeon: Add twsi interrupt initialization for OCTEON 3XXX, 5XXX, 63XX
DEC: Document the R4k MB ASIC mini interrupt controller
DEC: Add self as the maintainer
MIPS: Add microMIPS MSA support.
MIPS: Replace calls to obsolete strict_strto call with kstrto* equivalents.
MIPS: Replace obsolete strict_strto call with kstrto
MIPS: BFP: Simplify code slightly.
MIPS: Call find_vma with the mmap_sem held
MIPS: Fix 'write_msa_##' inline macro.
MIPS: Fix MSA toolchain support detection.
mips: Update the email address of Geert Uytterhoeven
MIPS: Add minimal defconfig for mips_paravirt
MIPS: Enable build for new system 'paravirt'
MIPS: paravirt: Add pci controller for virtio
MIPS: Add code for new system 'paravirt'
MIPS: Add functions for hypervisor call
MIPS: OCTEON: Add OCTEON3 to __get_cpu_type
MIPS: Add function get_ebase_cpunum
MIPS: Add minimal support for OCTEON3 to c-r4k.c
...
Declaring this allows drivers which need to initialise each struct
cpuidle_device at initialisation time to make use of the structures
already defined in cpuidle.c, rather than having to wastefully define
their own.
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
If freeze_enter() is called, we want to bypass the current cpuidle
governor and always use the deepest available (that is, not disabled)
C-state, because we want to save as much energy as reasonably possible
then and runtime latency constraints don't matter at that point, since
the system is in a sleep state anyway.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Aubrey Li <aubrey.li@linux.intel.com>
Since both cpuidle_enabled() and cpuidle_select() are only called by
cpuidle_idle_call(), it is not really useful to keep them separate
and combining them will help to avoid complicating cpuidle_idle_call()
even further if governors are changed to return error codes sometimes.
This code modification shouldn't lead to any functional changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to allow better integration between the cpuidle framework and the
scheduler, reducing the distance between these two sub-components will
facilitate this integration by moving part of the cpuidle code in the idle
task file and, because idle.c is in the sched directory, we have access to
the scheduler's private structures.
This patch splits the cpuidle_idle_call main entry function into 3 calls
to a newly added API:
1. select the idle state
2. enter the idle state
3. reflect the idle state
The cpuidle_idle_call calls these three functions to implement the main
idle entry function.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: rjw@rjwysocki.net
Cc: preeti@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1393832934-11625-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cpuidle_unregister_governor() and cpuidle_replace_governor() aren't
used anymore and can be removed. They were used by cpufreq governors
earlier, but since the governors can't be compiled as modules any
more, these two functions aren't necessary.
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add missing forward declarations of struct cpuidle_state_kobj and
struct cpuidle_driver_kobj in cpuidle.h.
[rjw: Changelog]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle sysfs code is designed to have a single instance of per
CPU cpuidle directory. It is not possible to remove the sysfs entry
and create it again. This is not a problem with the current code but
future changes will add CPU hotplug support to enable/disable the
device, so it will need to remove the sysfs entry like other
subsystems do. That won't be possible without this change, because
the kobj is a static object which can't be reused for
kobj_init_and_add().
Add cpuidle_device_kobj to be allocated dynamically when
adding/removing a sysfs entry which is consistent with the other
cpuidle's sysfs entries.
An added benefit is that the sysfs code is now more self-contained
and the includes needed for sysfs can be moved from cpuidle.h
directly into sysfs.c so as to reduce the total number of headers
dragged along with cpuidle.h.
[rjw: Changelog]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit bf4d1b5 (cpuidle: support multiple drivers) introduced support
for using multiple cpuidle drivers at the same time. It added a
couple of new APIs to register the driver per CPU, but that led to
some unnecessary code complexity related to the kernel config options
deciding whether or not the multiple driver support is enabled. The
code has to work as it did before when the multiple driver support is
not enabled and the multiple driver support has to be compatible with
the previously existing API.
Remove the new API, not used by any driver in the tree yet (but
needed for the HMP cpuidle drivers that will be submitted soon), and
add a new cpumask pointer to the cpuidle driver structure that will
point to the mask of CPUs handled by the given driver. That will
allow the cpuidle_[un]register_driver() API to be used for the
multiple driver support along with the cpuidle_[un]register()
functions added recently.
[rjw: Changelog]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull idle update from Len Brown:
"Add support for new Haswell-ULT CPU idle power states"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: initial C8, C9, C10 support
tools/power turbostat: display C8, C9, C10 residency
The usual scheme to initialize a cpuidle driver on a SMP is:
cpuidle_register_driver(drv);
for_each_possible_cpu(cpu) {
device = &per_cpu(cpuidle_dev, cpu);
cpuidle_register_device(device);
}
This code is duplicated in each cpuidle driver.
On UP systems, it is done this way:
cpuidle_register_driver(drv);
device = &per_cpu(cpuidle_dev, cpu);
cpuidle_register_device(device);
On UP, the macro 'for_each_cpu' does one iteration:
#define for_each_cpu(cpu, mask) \
for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
Hence, the initialization loop is the same for UP than SMP.
Beside, we saw different bugs / mis-initialization / return code unchecked in
the different drivers, the code is duplicated including bugs. After fixing all
these ones, it appears the initialization pattern is the same for everyone.
Please note, some drivers are doing dev->state_count = drv->state_count. This is
not necessary because it is done by the cpuidle_enable_device function in the
cpuidle framework. This is true, until you have the same states for all your
devices. Otherwise, the 'low level' API should be used instead with the specific
initialization for the driver.
Let's add a wrapper function doing this initialization with a cpumask parameter
for the coupled idle states and use it for all the drivers.
That will save a lot of LOC, consolidate the code, and the modifications in the
future could be done in a single place. Another benefit is the consolidation of
the cpuidle_device variable which is now in the cpuidle framework and no longer
spread accross the different arch specific drivers.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.
Remove the flag and the code related to it.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org> # for mach-omap2/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow intel_idle and cpuidle to utilize C8, C9, C10
when they are present on...
"Fourth Generation Intel(R) Core(TM) Processors",
which are based on Intel(R) microarchitecture code name Haswell.
Signed-off-by: Len Brown <len.brown@intel.com>
The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced
the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle
state stops the local timer.
Now use this flag to check at init time if one state will need
the broadcast timer and, in this case, setup the broadcast timer
framework. That prevents multiple code duplication in the drivers.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When a cpu enters a deep idle state, the local timers are stopped and
the time framework falls back to the timer device used as a broadcast
timer.
The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
when the idle state stops the local timer.
Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
drivers. If the flag is set, the cpuidle core code takes care of the
notification on behalf of the driver to avoid pointless code duplication.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We realized that the power usage field is never filled and when it
is filled for tegra, the power_specified flag is not set causing all
of these values to be reset when the driver is initialized with
set_power_state().
However, the power_specified flag can be simply removed under the
assumption that the states are always backward sorted, which is the
case with the current code.
This change allows the menu governor select function and the
cpuidle_play_dead() to be simplified. Moreover, the
set_power_states() function can removed as it does not make sense
any more.
Drop the power_specified flag from struct cpuidle_driver and make
the related changes as described above.
As a consequence, this also fixes the bug where on the dynamic
C-states system, the power fields are not initialized.
[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
References: https://lkml.org/lkml/2012/10/16/518
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>