Commit Graph

373 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
bab324e8a8 Merge 5.15.52 into android13-5.15-lts
Changes in 5.15.52
	tick/nohz: unexport __init-annotated tick_nohz_full_setup()
	x86, kvm: use proper ASM macros for kvm_vcpu_is_preempted
	bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
	xfs: use kmem_cache_free() for kmem_cache objects
	xfs: punch out data fork delalloc blocks on COW writeback failure
	xfs: Fix the free logic of state in xfs_attr_node_hasname
	xfs: remove all COW fork extents when remounting readonly
	xfs: check sb_meta_uuid for dabuf buffer recovery
	xfs: prevent UAF in xfs_log_item_in_current_chkpt
	xfs: only bother with sync_filesystem during readonly remount
	powerpc/ftrace: Remove ftrace init tramp once kernel init is complete
	fs: add is_idmapped_mnt() helper
	fs: move mapping helpers
	fs: tweak fsuidgid_has_mapping()
	fs: account for filesystem mappings
	docs: update mapping documentation
	fs: use low-level mapping helpers
	fs: remove unused low-level mapping helpers
	fs: port higher-level mapping helpers
	fs: add i_user_ns() helper
	fs: support mapped mounts of mapped filesystems
	fs: fix acl translation
	fs: account for group membership
	rtw88: 8821c: support RFE type4 wifi NIC
	rtw88: rtw8821c: enable rfe 6 devices
	net: mscc: ocelot: allow unregistered IP multicast flooding to CPU
	io_uring: fix not locked access to fixed buf table
	Linux 5.15.52

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2e16e1b5f0367946199f0a0a3b371b54f2377267
2022-08-08 16:18:44 +02:00
Masahiro Yamada
f4a80ec8c5 tick/nohz: unexport __init-annotated tick_nohz_full_setup()
commit 2390095113e98fc52fffe35c5206d30d9efe3f78 upstream.

EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.

modpost used to detect it, but it had been broken for a decade.

Commit 28438794aba4 ("modpost: fix section mismatch check for exported
init/exit sections") fixed it so modpost started to warn it again, then
this showed up:

    MODPOST vmlinux.symvers
  WARNING: modpost: vmlinux.o(___ksymtab_gpl+tick_nohz_full_setup+0x0): Section mismatch in reference from the variable __ksymtab_tick_nohz_full_setup to the function .init.text:tick_nohz_full_setup()
  The symbol tick_nohz_full_setup is exported and annotated __init
  Fix this by removing the __init annotation of tick_nohz_full_setup or drop the export.

Drop the export because tick_nohz_full_setup() is only called from the
built-in code in kernel/sched/isolation.c.

Fixes: ae9e557b5b ("time: Export tick start/stop functions for rcutorture")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Backlund <tmb@tmb.nu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-02 16:41:12 +02:00
Greg Kroah-Hartman
ec1a28c7c0 Merge 5.15.35 into android13-5.15
Changes in 5.15.35
	drm/amd/display: Add pstate verification and recovery for DCN31
	drm/amd/display: Fix p-state allow debug index on dcn31
	hamradio: defer 6pack kfree after unregister_netdev
	hamradio: remove needs_free_netdev to avoid UAF
	cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
	ACPI: processor idle: Check for architectural support for LPI
	ACPI: processor idle: Allow playing dead in C3 state
	ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
	btrfs: remove unused parameter nr_pages in add_ra_bio_pages()
	btrfs: remove no longer used counter when reading data page
	btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
	soc: qcom: aoss: Expose send for generic usecase
	dt-bindings: net: qcom,ipa: add optional qcom,qmp property
	net: ipa: request IPA register values be retained
	btrfs: release correct delalloc amount in direct IO write path
	ALSA: core: Add snd_card_free_on_error() helper
	ALSA: sis7019: Fix the missing error handling
	ALSA: ali5451: Fix the missing snd_card_free() call at probe error
	ALSA: als300: Fix the missing snd_card_free() call at probe error
	ALSA: als4000: Fix the missing snd_card_free() call at probe error
	ALSA: atiixp: Fix the missing snd_card_free() call at probe error
	ALSA: au88x0: Fix the missing snd_card_free() call at probe error
	ALSA: aw2: Fix the missing snd_card_free() call at probe error
	ALSA: azt3328: Fix the missing snd_card_free() call at probe error
	ALSA: bt87x: Fix the missing snd_card_free() call at probe error
	ALSA: ca0106: Fix the missing snd_card_free() call at probe error
	ALSA: cmipci: Fix the missing snd_card_free() call at probe error
	ALSA: cs4281: Fix the missing snd_card_free() call at probe error
	ALSA: cs5535audio: Fix the missing snd_card_free() call at probe error
	ALSA: echoaudio: Fix the missing snd_card_free() call at probe error
	ALSA: emu10k1x: Fix the missing snd_card_free() call at probe error
	ALSA: ens137x: Fix the missing snd_card_free() call at probe error
	ALSA: es1938: Fix the missing snd_card_free() call at probe error
	ALSA: es1968: Fix the missing snd_card_free() call at probe error
	ALSA: fm801: Fix the missing snd_card_free() call at probe error
	ALSA: galaxy: Fix the missing snd_card_free() call at probe error
	ALSA: hdsp: Fix the missing snd_card_free() call at probe error
	ALSA: hdspm: Fix the missing snd_card_free() call at probe error
	ALSA: ice1724: Fix the missing snd_card_free() call at probe error
	ALSA: intel8x0: Fix the missing snd_card_free() call at probe error
	ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
	ALSA: korg1212: Fix the missing snd_card_free() call at probe error
	ALSA: lola: Fix the missing snd_card_free() call at probe error
	ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
	ALSA: maestro3: Fix the missing snd_card_free() call at probe error
	ALSA: oxygen: Fix the missing snd_card_free() call at probe error
	ALSA: riptide: Fix the missing snd_card_free() call at probe error
	ALSA: rme32: Fix the missing snd_card_free() call at probe error
	ALSA: rme9652: Fix the missing snd_card_free() call at probe error
	ALSA: rme96: Fix the missing snd_card_free() call at probe error
	ALSA: sc6000: Fix the missing snd_card_free() call at probe error
	ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
	ALSA: via82xx: Fix the missing snd_card_free() call at probe error
	ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb
	ALSA: nm256: Don't call card private_free at probe error path
	drm/msm: Add missing put_task_struct() in debugfs path
	firmware: arm_scmi: Remove clear channel call on the TX channel
	memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
	Revert "ath11k: mesh: add support for 256 bitmap in blockack frames in 11ax"
	firmware: arm_scmi: Fix sorting of retrieved clock rates
	media: rockchip/rga: do proper error checking in probe
	SUNRPC: Fix the svc_deferred_event trace class
	net/sched: flower: fix parsing of ethertype following VLAN header
	veth: Ensure eth header is in skb's linear part
	gpiolib: acpi: use correct format characters
	cifs: release cached dentries only if mount is complete
	net: mdio: don't defer probe forever if PHY IRQ provider is missing
	mlxsw: i2c: Fix initialization error flow
	net/sched: fix initialization order when updating chain 0 head
	net: dsa: felix: suppress -EPROBE_DEFER errors
	net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
	net/sched: taprio: Check if socket flags are valid
	cfg80211: hold bss_lock while updating nontrans_list
	netfilter: nft_socket: make cgroup match work in input too
	drm/msm: Fix range size vs end confusion
	drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
	drm/msm/dp: add fail safe mode outside of event_mutex context
	net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
	scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
	scsi: pm80xx: Enable upper inbound, outbound queues
	scsi: iscsi: Move iscsi_ep_disconnect()
	scsi: iscsi: Fix offload conn cleanup when iscsid restarts
	scsi: iscsi: Fix endpoint reuse regression
	scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
	scsi: iscsi: Fix unbound endpoint error handling
	sctp: Initialize daddr on peeled off socket
	netfilter: nf_tables: nft_parse_register can return a negative value
	ALSA: ad1889: Fix the missing snd_card_free() call at probe error
	ALSA: mtpav: Don't call card private_free at probe error path
	io_uring: move io_uring_rsrc_update2 validation
	io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
	io_uring: verify pad field is 0 in io_get_ext_arg
	testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
	ALSA: usb-audio: Increase max buffer size
	ALSA: usb-audio: Limit max buffer and period sizes per time
	perf tools: Fix misleading add event PMU debug message
	macvlan: Fix leaking skb in source mode with nodst option
	net: ftgmac100: access hardware register after clock ready
	nfc: nci: add flush_workqueue to prevent uaf
	cifs: potential buffer overflow in handling symlinks
	dm mpath: only use ktime_get_ns() in historical selector
	vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
	net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
	block: fix offset/size check in bio_trim()
	drm/amd: Add USBC connector ID
	btrfs: fix fallocate to use file_modified to update permissions consistently
	btrfs: do not warn for free space inode in cow_file_range
	drm/amdgpu: conduct a proper cleanup of PDB bo
	drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
	drm/amd/display: fix audio format not updated after edid updated
	drm/amd/display: FEC check in timing validation
	drm/amd/display: Update VTEM Infopacket definition
	drm/amdkfd: Fix Incorrect VMIDs passed to HWS
	drm/amdgpu/vcn: improve vcn dpg stop procedure
	drm/amdkfd: Check for potential null return of kmalloc_array()
	Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
	PCI: hv: Propagate coherence from VMbus device to PCI device
	Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
	scsi: target: tcmu: Fix possible page UAF
	scsi: lpfc: Fix queue failures when recovering from PCI parity error
	scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
	net: micrel: fix KS8851_MLL Kconfig
	ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
	gpu: ipu-v3: Fix dev_dbg frequency output
	regulator: wm8994: Add an off-on delay for WM8994 variant
	arm64: alternatives: mark patch_alternative() as `noinstr`
	tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry
	net: axienet: setup mdio unconditionally
	Drivers: hv: balloon: Disable balloon and hot-add accordingly
	net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
	myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
	spi: cadence-quadspi: fix protocol setup for non-1-1-X operations
	drm/amd/display: Enable power gating before init_pipes
	drm/amd/display: Revert FEC check in validation
	drm/amd/display: Fix allocate_mst_payload assert on resume
	drbd: set QUEUE_FLAG_STABLE_WRITES
	scsi: mpt3sas: Fail reset operation if config request timed out
	scsi: mvsas: Add PCI ID of RocketRaid 2640
	scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan
	drivers: net: slip: fix NPD bug in sl_tx_timeout()
	io_uring: zero tag on rsrc removal
	io_uring: use nospec annotation for more indexes
	perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant
	mm/secretmem: fix panic when growing a memfd_secret
	mm, page_alloc: fix build_zonerefs_node()
	mm: fix unexpected zeroed page mapping with zram swap
	mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
	KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded
	SUNRPC: Fix NFSD's request deferral on RDMA transports
	memory: renesas-rpc-if: fix platform-device leak in error path
	gcc-plugins: latent_entropy: use /dev/urandom
	cifs: verify that tcon is valid before dereference in cifs_kill_sb
	ath9k: Properly clear TX status area before reporting to mac80211
	ath9k: Fix usage of driver-private space in tx_info
	btrfs: fix root ref counts in error handling in btrfs_get_root_ref
	btrfs: mark resumed async balance as writing
	ALSA: hda/realtek: Add quirk for Clevo PD50PNT
	ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers
	ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
	nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size
	ipv6: fix panic when forwarding a pkt with no in6 dev
	drm/amd/display: don't ignore alpha property on pre-multiplied mode
	drm/amdgpu: Enable gfxoff quirk on MacBook Pro
	x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
	x86/tsx: Disable TSX development mode at boot
	genirq/affinity: Consider that CPUs on nodes can be unbalanced
	tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
	ARM: davinci: da850-evm: Avoid NULL pointer dereference
	dm integrity: fix memory corruption when tag_size is less than digest size
	i2c: dev: check return value when calling dev_set_name()
	smp: Fix offline cpu check in flush_smp_call_function_queue()
	i2c: pasemi: Wait for write xfers to finish
	dt-bindings: net: snps: remove duplicate name
	timers: Fix warning condition in __run_timers()
	dma-direct: avoid redundant memory sync for swiotlb
	drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL
	cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
	soc: qcom: aoss: Fix missing put_device call in qmp_get
	net: ipa: fix a build dependency
	cpufreq: intel_pstate: ITMT support for overclocked system
	ax25: add refcount in ax25_dev to avoid UAF bugs
	ax25: fix reference count leaks of ax25_dev
	ax25: fix UAF bugs of net_device caused by rebinding operation
	ax25: Fix refcount leaks caused by ax25_cb_del()
	ax25: fix UAF bug in ax25_send_control()
	ax25: fix NPD bug in ax25_disconnect
	ax25: Fix NULL pointer dereferences in ax25 timers
	ax25: Fix UAF bugs in ax25 timers
	Linux 5.15.35

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0dd9eaea7f977df42b0a5b9cb9043c879f62718b
2022-04-24 16:58:59 +02:00
Paul Gortmaker
68a38b07f1 tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
commit 40e97e42961f8c6cc7bd5fe67cc18417e02d78f1 upstream.

While running some testing on code that happened to allow the variable
tick_nohz_full_running to get set but with no "possible" NOHZ cores to
back up that setting, this warning triggered:

        if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
                WARN_ON(tick_nohz_full_running);

The console was overwhemled with an endless stream of one WARN per tick
per core and there was no way to even see what was going on w/o using a
serial console to capture it and then trace it back to this.

Change it to WARN_ON_ONCE().

Fixes: 08ae95f4fd ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211206145950.10927-3-paul.gortmaker@windriver.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-20 09:34:20 +02:00
Lee Jones
3dad5fb4f7 Merge e17c120f48 ("Merge tag 'array-bounds-fixes-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux") into android-mainline
Small step en route to v5.14-rc1

Change-Id: I9239ea78f8628a274efd1f4dac0dc7ae31310edc
Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-07-09 13:25:02 +00:00
Linus Torvalds
9269d27e51 Merge tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz updates from Ingo Molnar:

 - Micro-optimize tick_nohz_full_cpu()

 - Optimize idle exit tick restarts to be less eager

 - Optimize tick_nohz_dep_set_task() to only wake up a single CPU.
   This reduces IPIs and interruptions on nohz_full CPUs.

 - Optimize tick_nohz_dep_set_signal() in a similar fashion.

 - Skip IPIs in tick_nohz_kick_task() when trying to kick a
   non-running task.

 - Micro-optimize tick_nohz_task_switch() IRQ flags handling to
   reduce context switching costs.

 - Misc cleanups and fixes

* tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add myself as context tracking maintainer
  tick/nohz: Call tick_nohz_task_switch() with interrupts disabled
  tick/nohz: Kick only _queued_ task whose tick dependency is updated
  tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
  tick/nohz: Only wake up a single target cpu when kicking a task
  tick/nohz: Update nohz_full Kconfig help
  tick/nohz: Update idle_exittime on actual idle exit
  tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
  tick/nohz: Conditionally restart tick on idle exit
  tick/nohz: Evaluate the CPU expression after the static key
2021-06-28 12:22:06 -07:00
Greg Kroah-Hartman
1a6552d0ed Merge 8ecfa36cd4 ("Merge tag 'riscv-for-linus-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux") into android-mainline
Steps on the way to 5.13-rc6

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Iaed63766ecfc7e2126e108e93808e79469a9facf
2021-06-13 15:19:53 +02:00
Frederic Weisbecker
f268c3737e tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed
Checking for and processing RCU-nocb deferred wakeup upon user/guest
entry is only relevant when nohz_full runs on the local CPU, otherwise
the periodic tick should take care of it.

Make sure we don't needlessly pollute these fast-paths as a -3%
performance regression on a will-it-scale.per_process_ops has been
reported so far.

Fixes: 47b8ff194c (entry: Explicitly flush pending rcuog wakeup before last rescheduling point)
Fixes: 4ae7dc97f7 (entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point)
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210527113441.465489-1-frederic@kernel.org
2021-05-31 10:14:49 +02:00
Peter Zijlstra
0fdcccfafc tick/nohz: Call tick_nohz_task_switch() with interrupts disabled
Call tick_nohz_task_switch() slightly earlier after the context switch
to benefit from disabled IRQs. This way the function doesn't need to
disable them once more.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210512232924.150322-10-frederic@kernel.org
2021-05-13 14:21:23 +02:00
Marcelo Tosatti
a1dfb6311c tick/nohz: Kick only _queued_ task whose tick dependency is updated
When the tick dependency of a task is updated, we want it to aknowledge
the new state and restart the tick if needed. If the task is not
running, we don't need to kick it because it will observe the new
dependency upon scheduling in. But if the task is running, we may need
to send an IPI to it so that it gets notified.

Unfortunately we don't have the means to check if a task is running
in a race free way. Checking p->on_cpu in a synchronized way against
p->tick_dep_mask would imply adding a full barrier between
prepare_task_switch() and tick_nohz_task_switch(), which we want to
avoid in this fast-path.

Therefore we blindly fire an IPI to the task's CPU.

Meanwhile we can check if the task is queued on the CPU rq because
p->on_rq is always set to TASK_ON_RQ_QUEUED _before_ schedule() and its
full barrier that precedes tick_nohz_task_switch(). And if the task is
queued on a nohz_full CPU, it also has fair chances to be running as the
isolation constraints prescribe running single tasks on full dynticks
CPUs.

So use this as a trick to check if we can spare an IPI toward a
non-running task.

NOTE: For the ordering to be correct, it is assumed that we never
deactivate a task while it is running, the only exception being the task
deactivating itself while scheduling out.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-9-frederic@kernel.org
2021-05-13 14:21:22 +02:00
Marcelo Tosatti
1e4ca26d36 tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
Rather than waking up all nohz_full CPUs on the system, only wake up
the target CPUs of member threads of the signal.

Reduces interruptions to nohz_full CPUs.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-8-frederic@kernel.org
2021-05-13 14:21:22 +02:00
Frederic Weisbecker
29721b8592 tick/nohz: Only wake up a single target cpu when kicking a task
When adding a tick dependency to a task, its necessary to
wake up the CPU where the task resides to reevaluate tick
dependencies on that CPU.

However the current code wakes up all nohz_full CPUs, which
is unnecessary.

Switch to waking up a single CPU, by using ordering of writes
to task->cpu and task->tick_dep_mask.

[ mingo: Minor readability edit. ]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-7-frederic@kernel.org
2021-05-13 14:21:22 +02:00
Yunfeng Ye
96c9b90396 tick/nohz: Update idle_exittime on actual idle exit
The idle_exittime field of tick_sched is used to record the time when
the idle state was left. but currently the idle_exittime is updated in
the function tick_nohz_restart_sched_tick(), which is not always in idle
state when nohz_full is configured:

  tick_irq_exit
    tick_nohz_irq_exit
      tick_nohz_full_update_tick
        tick_nohz_restart_sched_tick
          ts->idle_exittime = now;

It's thus overwritten by mistake on nohz_full tick restart. Move the
update to the appropriate idle exit path instead.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-5-frederic@kernel.org
2021-05-13 14:21:21 +02:00
Frederic Weisbecker
3f624314b3 tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
The vtime_accounting_enabled_this_cpu() early check already makes what
follows as dead code in the case of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
No need to keep the ifdeferry around.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-4-frederic@kernel.org
2021-05-13 14:21:21 +02:00
Yunfeng Ye
a5183862e7 tick/nohz: Conditionally restart tick on idle exit
In nohz_full mode, switching from idle to a task will unconditionally
issue a tick restart. If the task is alone in the runqueue or is the
highest priority, the tick will fire once then eventually stop. But that
alone is still undesired noise.

Therefore, only restart the tick on idle exit when it's strictly
necessary.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512232924.150322-3-frederic@kernel.org
2021-05-13 14:21:21 +02:00
Lee Jones
7561514944 Merge commit e7c6e405e1 ("Fix misc new gcc warnings") into android-mainline
Steps on the way to 5.13-rc1

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change-Id: Iff6fb6b3991943905d20a8b40e2b2dd87c0d792b
2021-04-29 10:20:06 +01:00
Linus Torvalds
5469f160e6 Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
 "These add some new hardware support (for example, IceLake-D idle
  states in intel_idle), fix some issues (for example, the handling of
  negative "sleep length" values in cpuidle governors), add new
  functionality to the existing drivers (for example, scale-invariance
  support in the ACPI CPPC cpufreq driver) and clean up code all over.

  Specifics:

   - Add idle states table for IceLake-D to the intel_idle driver and
     update IceLake-X C6 data in it (Artem Bityutskiy).

   - Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
     drop the unused do_idle() firmware call from it (Dmitry Osipenko).

   - Fix cpuidle-qcom-spm Kconfig entry (He Ying).

   - Fix handling of possible negative tick_nohz_get_next_hrtimer()
     return values of in cpuidle governors (Rafael Wysocki).

   - Add support for frequency-invariance to the ACPI CPPC cpufreq
     driver and update the frequency-invariance engine (FIE) to use it
     as needed (Viresh Kumar).

   - Simplify the default delay_us setting in the ACPI CPPC cpufreq
     driver (Tom Saeger).

   - Clean up frequency-related computations in the intel_pstate cpufreq
     driver (Rafael Wysocki).

   - Fix TBG parent setting for load levels in the armada-37xx cpufreq
     driver and drop the CPU PM clock .set_parent method for armada-37xx
     (Marek Behún).

   - Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).

   - Fix handling of dev_pm_opp_of_cpumask_add_table() return values in
     cpufreq-dt to take the -EPROBE_DEFER one into acconut as
     appropriate (Quanyang Wang).

   - Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).

   - Drop the unused for_each_policy() macro from cpufreq (Shaokun
     Zhang).

   - Simplify computations in the schedutil cpufreq governor to avoid
     unnecessary overhead (Yue Hu).

   - Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).

   - Fix cpufreq documentation links in Kconfig (Alexander Monakov).

   - Fix PCI device power state handling in pci_enable_device_flags() to
     avoid issuse in some cases when the device depends on an ACPI power
     resource (Rafael Wysocki).

   - Add missing documentation of pm_runtime_resume_and_get() (Alan
     Stern).

   - Add missing static inline stub for pm_runtime_has_no_callbacks() to
     pm_runtime.h and drop the unused try_to_freeze_nowarn() definition
     (YueHaibing).

   - Drop duplicate struct device declaration from pm.h and fix a
     structure type declaration in intel_rapl.h (Wan Jiabing).

   - Use dev_set_name() instead of an open-coded equivalent of it in the
     wakeup sources code and drop a redundant local variable
     initialization from it (Andy Shevchenko, Colin Ian King).

   - Use crc32 instead of md5 for e820 memory map integrity check during
     resume from hibernation on x86 (Chris von Recklinghausen).

   - Fix typos in comments in the system-wide and hibernation support
     code (Lu Jialin).

   - Modify the generic power domains (genpd) code to avoid resuming
     devices in the "prepare" phase of system-wide suspend and
     hibernation (Ulf Hansson).

   - Add Hygon Fam18h RAPL support to the intel_rapl power capping
     driver (Pu Wen).

   - Add MAINTAINERS entry for the dynamic thermal power management
     (DTPM) code (Daniel Lezcano).

   - Add devm variants of operating performance points (OPP) API
     functions and switch over some users of the OPP framework to the
     new resource-managed API (Yangtao Li and Dmitry Osipenko).

   - Update devfreq core:

      * Register devfreq devices as cooling devices on demand (Daniel
        Lezcano).

      * Add missing unlock opeation in devfreq_add_device() (Lukasz
        Luba).

      * Use the next frequency as resume_freq instead of the previous
        frequency when using the opp-suspend property (Dong Aisheng).

      * Check get_dev_status in devfreq_update_stats() (Dong Aisheng).

      * Fix set_freq path for the userspace governor in Kconfig (Dong
        Aisheng).

      * Remove invalid description of get_target_freq() (Dong Aisheng).

   - Update devfreq drivers:

      * imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
        of_match_ptr() (Dong Aisheng, Fabio Estevam).

      * rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
        references to undefined symbols (Enric Balletbo i Serra, Gaël
        PORTAY).

      * rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
        Kozlowski).

      * imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).

   - Fix kernel-doc warnings in three places (Pierre-Louis Bossart).

   - Fix typo in the pm-graph utility code (Ricardo Ribalda)"

* tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
  PM: wakeup: remove redundant assignment to variable retval
  PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
  cpufreq: Kconfig: fix documentation links
  PM: wakeup: use dev_set_name() directly
  PM: runtime: Add documentation for pm_runtime_resume_and_get()
  cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
  cpufreq: armada-37xx: Fix module unloading
  cpufreq: armada-37xx: Remove cur_frequency variable
  cpufreq: armada-37xx: Fix determining base CPU frequency
  cpufreq: armada-37xx: Fix driver cleanup when registration failed
  clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
  clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
  cpufreq: armada-37xx: Fix the AVS value for load L1
  clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
  cpufreq: armada-37xx: Fix setting TBG parent for load levels
  cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
  cpuidle: tegra: Remove do_idle firmware call
  cpuidle: tegra: Fix C7 idling state on Tegra114
  PM: sleep: fix typos in comments
  cpufreq: Remove unused for_each_policy macro
  ...
2021-04-26 15:10:25 -07:00
Linus Torvalds
87dcebff92 Merge tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The time and timers updates contain:

  Core changes:

   - Allow runtime power management when the clocksource is changed.

   - A correctness fix for clock_adjtime32() so that the return value on
     success is not overwritten by the result of the copy to user.

   - Allow late installment of broadcast clockevent devices which was
     broken because nothing switched them over to oneshot mode. This
     went unnoticed so far because clockevent devices used to be built
     in, but now people started to make them modular.

   - Debugfs related simplifications

   - Small cleanups and improvements here and there

  Driver changes:

   - The usual set of device tree binding updates for a wide range of
     drivers/devices.

   - The usual updates and improvements for drivers all over the place
     but nothing outstanding.

   - No new clocksource/event drivers. They'll come back next time"

* tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  posix-timers: Preserve return value in clock_adjtime32()
  tick/broadcast: Allow late registered device to enter oneshot mode
  tick: Use tick_check_replacement() instead of open coding it
  time/timecounter: Mark 1st argument of timecounter_cyc2time() as const
  dt-bindings: timer: nuvoton,npcm7xx: Add wpcm450-timer
  clocksource/drivers/arm_arch_timer: Add __ro_after_init and __init
  clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
  clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
  clocksource/drivers/dw_apb_timer_of: Add handling for potential memory leak
  clocksource/drivers/npcm: Add support for WPCM450
  clocksource/drivers/sh_cmt: Don't use CMTOUT_IE with R-Car Gen2/3
  clocksource/drivers/pistachio: Fix trivial typo
  clocksource/drivers/ingenic_ost: Fix return value check in ingenic_ost_probe()
  clocksource/drivers/timer-ti-dm: Add missing set_state_oneshot_stopped
  clocksource/drivers/timer-ti-dm: Fix posted mode status check order
  dt-bindings: timer: renesas,cmt: Document R8A77961
  dt-bindings: timer: renesas,cmt: Add r8a779a0 CMT support
  clocksource/drivers/ingenic-ost: Add support for the JZ4760B
  clocksource/drivers/ingenic: Add support for the JZ4760
  dt-bindings: timer: ingenic: Add compatible strings for JZ4760(B)
  ...
2021-04-26 09:54:03 -07:00
Rafael J. Wysocki
4c81cb7e64 tick/nohz: Improve tick_nohz_get_next_hrtimer() kerneldoc
Make the tick_nohz_get_next_hrtimer() kerneldoc comment state clearly
that the function may return negative numbers.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-07 19:26:44 +02:00
Ingo Molnar
4bf07f6562 timekeeping, clocksource: Fix various typos in comments
Fix ~56 single-word typos in timekeeping & clocksource code comments.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-kernel@vger.kernel.org
2021-03-22 23:06:48 +01:00
Thomas Gleixner
47c218dcae tick/sched: Prevent false positive softirq pending warnings on RT
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.

To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de
2021-03-17 16:34:11 +01:00
Quentin Perret
7587bc9dcf ANDROID: sched: time: Export symbols needed for schedutil module
Export symbols needed to allow building a schedutil-based vendor module
with GKI.

This is a small price to pay to give vendors the flexibility they need,
and avoids littering cpufreq_schedutil.c with many vendor hooks.

Bug: 170511085
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I8ff8bdb32df5d47124236819efba881c1a2a538d
(cherry picked from commit 34cd6916744b8b2d2107d2d5f10cbacb181e4f6c)
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-02-04 18:29:42 -08:00
Greg Kroah-Hartman
6a70670ac0 Merge v5.11-rc1 into android-mainline
Linux 5.11-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibcbb64f4f03a1e9c04fcfd59ce8e589666d77afc
2021-01-13 14:52:03 +01:00
Linus Torvalds
2eeefc60ad Merge tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Update/fix two CPU sanity checks in the hotplug and the boot code, and
  fix a typo in the Kconfig help text.

  [ Context: the first two commits are the result of an ongoing
    annotation+review work of (intentional) tick_do_timer_cpu() data
    races reported by KCSAN, but the annotations aren't fully cooked
    yet ]"

* tag 'timers-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix spelling mistake in Kconfig "fullfill" -> "fulfill"
  tick/sched: Remove bogus boot "safety" check
  tick: Remove pointless cpu valid check in hotplug code
2020-12-27 09:03:41 -08:00
Thomas Gleixner
ba8ea8e7dd tick/sched: Remove bogus boot "safety" check
can_stop_idle_tick() checks whether the do_timer() duty has been taken over
by a CPU on boot. That's silly because the boot CPU always takes over with
the initial clockevent device.

But even if no CPU would have installed a clockevent and taken over the
duty then the question whether the tick on the current CPU can be stopped
or not is moot. In that case the current CPU would have no clockevent
either, so there would be nothing to keep ticking.

Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201206212002.725238293@linutronix.de
2020-12-16 11:26:27 +01:00
Quentin Perret
0dd08d5801 Merge adb35e8dc9 ("Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mainline
Now that CPU pause is gone, this is a lot more manageable. The
remaining conflicts were caused mostly by vendor hooks and Android-specific
tweaks to the EAS topology code, but easily fixable by hand.

Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I3665a2d78cb0b8eca6ba5110e90dc7f72030805e
2020-12-16 09:09:52 +00:00
Greg Kroah-Hartman
482ed74e40 Merge 533369b145 ("Merge tag 'timers-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") into android-mailine
Steps on the way to 5.11-rc1

Resolves merge conflicts in:
	include/uapi/linux/prctl.h
	kernel/sys.c

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I85ea8cffcd22f93277b357872254fe21d68bd82c
2020-12-15 16:43:58 +01:00
Linus Torvalds
adb35e8dc9 Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:

 - migrate_disable/enable() support which originates from the RT tree
   and is now a prerequisite for the new preemptible kmap_local() API
   which aims to replace kmap_atomic().

 - A fair amount of topology and NUMA related improvements

 - Improvements for the frequency invariant calculations

 - Enhanced robustness for the global CPU priority tracking and decision
   making

 - The usual small fixes and enhancements all over the place

* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
  sched/fair: Trivial correction of the newidle_balance() comment
  sched/fair: Clear SMT siblings after determining the core is not idle
  sched: Fix kernel-doc markup
  x86: Print ratio freq_max/freq_base used in frequency invariance calculations
  x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
  x86, sched: Calculate frequency invariance for AMD systems
  irq_work: Optimize irq_work_single()
  smp: Cleanup smp_call_function*()
  irq_work: Cleanup
  sched: Limit the amount of NUMA imbalance that can exist at fork time
  sched/numa: Allow a floating imbalance between NUMA nodes
  sched: Avoid unnecessary calculation of load imbalance at clone time
  sched/numa: Rename nr_running and break out the magic number
  sched: Make migrate_disable/enable() independent of RT
  sched/topology: Condition EAS enablement on FIE support
  arm64: Rebuild sched domains on invariance status changes
  sched/topology,schedutil: Wrap sched domains rebuild
  sched/uclamp: Allow to reset a task uclamp constraint value
  sched/core: Fix typos in comments
  Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
  ...
2020-12-14 18:29:11 -08:00
Thomas Gleixner
aa3b66f401 tick/sched: Make jiffies update quick check more robust
The quick check in tick_do_update_jiffies64() whether jiffies need to be
updated is not really correct under all circumstances and on all
architectures, especially not on 32bit systems.

The quick check does:

    if (now < READ_ONCE(tick_next_period))
    	return;

and the counterpart in the update is:

    WRITE_ONCE(tick_next_period, next_update_time);

This has two problems:

  1) On weakly ordered architectures there is no guarantee that the stores
     before the WRITE_ONCE() are visible which means that other CPUs can
     operate on a stale jiffies value.

  2) On 32bit the store of tick_next_period which is an u64 is split into
     two 32bit stores. If the first 32bit store advances tick_next_period
     far out and the second 32bit store is delayed (virt, NMI ...) then
     jiffies will become stale until the second 32bit store happens.

Address this by seperating the handling for 32bit and 64bit.

On 64bit problem #1 is addressed by replacing READ_ONCE() / WRITE_ONCE()
with smp_load_acquire() / smp_store_release().

On 32bit problem #2 is addressed by protecting the quick check with the
jiffies sequence counter. The load and stores can be plain because the
sequence count mechanics provides the required barriers already.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87czzpc02w.fsf@nanos.tec.linutronix.de
2020-12-11 23:19:10 +01:00
Peter Zijlstra
7a9f50a058 irq_work: Cleanup
Get rid of the __call_single_node union and clean up the API a little
to avoid external code relying on the structure layout as much.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-24 16:47:48 +01:00
Thomas Gleixner
b996544916 tick: Get rid of tick_period
The variable tick_period is initialized to NSEC_PER_TICK / HZ during boot
and never updated again.

If NSEC_PER_TICK is not an integer multiple of HZ this computation is less
accurate than TICK_NSEC which has proper rounding in place.

Aside of the inaccuracy there is no reason for having this variable at
all. It's just a pointless indirection and all usage sites can just use the
TICK_NSEC constant.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.766643526@linutronix.de
2020-11-19 10:48:29 +01:00
Yunfeng Ye
896b969e67 tick/sched: Release seqcount before invoking calc_load_global()
calc_load_global() does not need the sequence count protection.

[ tglx: Split it up properly and added comments ]

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.660902274@linutronix.de
2020-11-19 10:48:29 +01:00
Thomas Gleixner
7a35bf2a6a tick/sched: Optimize tick_do_update_jiffies64() further
Now that it's clear that there is always one tick to account, simplify the
calculations some more.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.565663056@linutronix.de
2020-11-19 10:48:29 +01:00
Yunfeng Ye
94ad2e3ced tick/sched: Reduce seqcount held scope in tick_do_update_jiffies64()
If jiffies are up to date already (caller lost the race against another
CPU) there is no point to change the sequence count. Doing that just forces
other CPUs into the seqcount retry loop in tick_nohz_next_event() for
nothing.

Just bail out early.

[ tglx: Rewrote most of it ]

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.462195901@linutronix.de
2020-11-19 10:48:29 +01:00
Thomas Gleixner
372acbbaa8 tick/sched: Use tick_next_period for lockless quick check
No point in doing calculations.

   tick_next_period = last_jiffies_update + tick_period

Just check whether now is before tick_next_period to figure out whether
jiffies need an update.

Add a comment why the intentional data race in the quick check is safe or
not so safe in a 32bit corner case and why we don't worry about it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.337366695@linutronix.de
2020-11-19 10:48:29 +01:00
Thomas Gleixner
c398960cd8 tick: Document protections for tick related data
The protection rules for tick_next_period and last_jiffies_update are blury
at best. Clarify this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.197713794@linutronix.de
2020-11-19 10:48:28 +01:00
Amir Vajid
9bdaa3fa87 ANDROID: vendor_hooks: Add hook for jiffies updates
Create a vendor hook for jiffies updates by the
tick_do_timer_cpu.

Bug: 148928265
Change-Id: Ia442e20d446b8ce4f2b3f2be76655e72919c76eb
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
2020-11-10 21:50:56 +00:00
Greg Kroah-Hartman
b3dd1b5952 Merge 922a763ae1 ("Merge tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs") into android-mainline
Steps on the way to 5.10-rc1

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I520719ae5e0d992c3756e393cb299d77d650622e
2020-10-26 10:08:43 +01:00
Lina Iyer
4de53aef65 ANDROID: tick/nohz: export tick_nohz_get_sleep_length()
Export tick_nohz_get_sleep_length() so idle drivers may use this to
determine the available idle time before the next timer wakeup.

Bug: 169136276
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Change-Id: I0d18638d63c032862ae048bc2c3d49fa1bd90291
2020-10-23 18:05:30 +00:00
Paul E. McKenney
bca37119c5 tick-sched: Clarify "NOHZ: local_softirq_pending" warning
Currently, can_stop_idle_tick() prints "NOHZ: local_softirq_pending HH"
(where "HH" is the hexadecimal softirq vector number) when one or more
non-RCU softirq handlers are still enabled when checking to stop the
scheduler-tick interrupt.  This message is not as enlightening as one
might hope, so this commit changes it to "NOHZ tick-stop error: Non-RCU
local softirq work is pending, handler #HH".

Reported-by: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-08-24 18:38:32 -07:00
Frederic Weisbecker
3c8920e2db tick/nohz: Narrow down noise while setting current task's tick dependency
Setting a tick dependency on any task, including the case where a task
sets that dependency on itself, triggers an IPI to all CPUs.  That is
of course suboptimal but it had previously not been an issue because it
was only used by POSIX CPU timers on nohz_full, which apparently never
occurs in latency-sensitive workloads in production.  (Or users of such
systems are suffering in silence on the one hand or venting their ire
on the wrong people on the other.)

But RCU now sets a task tick dependency on the current task in order
to fix stall issues that can occur during RCU callback processing.
Thus, RCU callback processing triggers frequent system-wide IPIs from
nohz_full CPUs.  This is quite counter-productive, after all, avoiding
IPIs is what nohz_full is supposed to be all about.

This commit therefore optimizes tasks' self-setting of a task tick
dependency by using tick_nohz_full_kick() to avoid the system-wide IPI.
Instead, only the execution of the one task is disturbed, which is
acceptable given that this disturbance is well down into the noise
compared to the degree to which the RCU callback processing itself
disturbs execution.

Fixes: 6a949b7af8 (rcu: Force on tick when invoking lots of callbacks)
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: stable@kernel.org
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-29 11:58:50 -07:00
Sebastian Andrzej Siewior
49915ac35c lockdep: Annotate irq_work
Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in
hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be
invoked in softirq context on PREEMPT_RT.

Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq
context so lockdep knows that these can safely acquire a spinlock_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
2020-03-21 16:00:24 +01:00
Thomas Gleixner
e5d4d1756b timekeeping: Split jiffies seqlock
seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.120587764@linutronix.de
2020-03-21 16:00:23 +01:00
Eric Dumazet
de95a991bb tick/sched: Annotate lockless access to last_jiffies_update
syzbot (KCSAN) reported a data-race in tick_do_update_jiffies64():

BUG: KCSAN: data-race in tick_do_update_jiffies64 / tick_do_update_jiffies64

write to 0xffffffff8603d008 of 8 bytes by interrupt on cpu 1:
 tick_do_update_jiffies64+0x100/0x250 kernel/time/tick-sched.c:73
 tick_sched_do_timer+0xd4/0xe0 kernel/time/tick-sched.c:138
 tick_sched_timer+0x43/0xe0 kernel/time/tick-sched.c:1292
 __run_hrtimer kernel/time/hrtimer.c:1514 [inline]
 __hrtimer_run_queues+0x274/0x5f0 kernel/time/hrtimer.c:1576
 hrtimer_interrupt+0x22a/0x480 kernel/time/hrtimer.c:1638
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1110 [inline]
 smp_apic_timer_interrupt+0xdc/0x280 arch/x86/kernel/apic/apic.c:1135
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 arch_local_irq_restore arch/x86/include/asm/paravirt.h:756 [inline]
 kcsan_setup_watchpoint+0x1d4/0x460 kernel/kcsan/core.c:436
 check_access kernel/kcsan/core.c:466 [inline]
 __tsan_read1 kernel/kcsan/core.c:593 [inline]
 __tsan_read1+0xc2/0x100 kernel/kcsan/core.c:593
 kallsyms_expand_symbol.constprop.0+0x70/0x160 kernel/kallsyms.c:79
 kallsyms_lookup_name+0x7f/0x120 kernel/kallsyms.c:170
 insert_report_filterlist kernel/kcsan/debugfs.c:155 [inline]
 debugfs_write+0x14b/0x2d0 kernel/kcsan/debugfs.c:256
 full_proxy_write+0xbd/0x100 fs/debugfs/file.c:225
 __vfs_write+0x67/0xc0 fs/read_write.c:494
 vfs_write fs/read_write.c:558 [inline]
 vfs_write+0x18a/0x390 fs/read_write.c:542
 ksys_write+0xd5/0x1b0 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x4c/0x60 fs/read_write.c:620
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffffffff8603d008 of 8 bytes by task 0 on cpu 0:
 tick_do_update_jiffies64+0x2b/0x250 kernel/time/tick-sched.c:62
 tick_nohz_update_jiffies kernel/time/tick-sched.c:505 [inline]
 tick_nohz_irq_enter kernel/time/tick-sched.c:1257 [inline]
 tick_irq_enter+0x139/0x1c0 kernel/time/tick-sched.c:1274
 irq_enter+0x4f/0x60 kernel/softirq.c:354
 entering_irq arch/x86/include/asm/apic.h:517 [inline]
 entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
 smp_apic_timer_interrupt+0x55/0x280 arch/x86/kernel/apic/apic.c:1133
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830
 native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60
 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x1af/0x280 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
 rest_init+0xec/0xf6 init/main.c:452
 arch_call_rest_init+0x17/0x37
 start_kernel+0x838/0x85e init/main.c:786
 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490
 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Use READ_ONCE() and WRITE_ONCE() to annotate this expected race.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191205045619.204946-1-edumazet@google.com
2020-01-15 10:54:12 +01:00
Linus Torvalds
1ae78780ed Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Dynamic tick (nohz) updates, perhaps most notably changes to force
     the tick on when needed due to lengthy in-kernel execution on CPUs
     on which RCU is waiting.

   - Linux-kernel memory consistency model updates.

   - Replace rcu_swap_protected() with rcu_prepace_pointer().

   - Torture-test updates.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
  bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
  fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
  drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
  drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
  x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
  rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
  rcu: Suppress levelspread uninitialized messages
  rcu: Fix uninitialized variable in nocb_gp_wait()
  rcu: Update descriptions for rcu_future_grace_period tracepoint
  rcu: Update descriptions for rcu_nocb_wake tracepoint
  rcu: Remove obsolete descriptions for rcu_barrier tracepoint
  rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
  workqueue: Convert for_each_wq to use built-in list check
  rcu: Several rcu_segcblist functions can be static
  rcu: Remove unused function hlist_bl_del_init_rcu()
  Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
  ...
2019-11-26 15:42:43 -08:00
Frederic Weisbecker
e44fcb4b7a sched/vtime: Rename vtime_accounting_cpu_enabled() to vtime_accounting_enabled_this_cpu()
Standardize the naming on top of the vtime_accounting_enabled_*() base.
Also make it clear we are checking the vtime state of the
*current* CPU with this function. We'll need to add an API to check that
state on remote CPUs as well, so we must disambiguate the naming.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Link: https://lkml.kernel.org/r/20191016025700.31277-9-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29 10:01:14 +01:00
Paul E. McKenney
ae9e557b5b time: Export tick start/stop functions for rcutorture
It turns out that rcutorture needs to ensure that the scheduling-clock
interrupt is enabled in CONFIG_NO_HZ_FULL kernels before starting on
CPU-bound in-kernel processing.  This commit therefore exports
tick_nohz_dep_set_task(), tick_nohz_dep_clear_task(), and
tick_nohz_full_setup() to GPL kernel modules.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:46:03 -07:00
Frederic Weisbecker
01b4c39901 nohz: Add TICK_DEP_BIT_RCU
If a nohz_full CPU is looping in the kernel, the scheduling-clock tick
might nevertheless remain disabled.  In !PREEMPT kernels, this can
prevent RCU's attempts to enlist the aid of that CPU's executions of
cond_resched(), which can in turn result in an arbitrarily delayed grace
period and thus an OOM.  RCU therefore needs a way to enable a holdout
nohz_full CPU's scheduler-clock interrupt.

This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can
pass to tick_dep_set_cpu() and friends to force on the scheduler-clock
interrupt for a specified CPU or task.  In some cases, rcutorture needs
to turn on the scheduler-clock tick, so this commit also exports the
relevant symbols to GPL-licensed modules.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2019-10-05 10:45:16 -07:00
Sebastian Andrzej Siewior
71fed982d6 tick: Mark sched_timer to expire in hard interrupt context
sched_timer must be initialized with the _HARD mode suffix to ensure expiry
in hard interrupt context on RT.

The previous conversion to HARD expiry mode missed on one instance in
tick_nohz_switch_to_nohz(). Fix it up.

Fixes: 902a9f9c50 ("tick: Mark tick related hrtimers to expiry in hard interrupt context")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190823113845.12125-3-bigeasy@linutronix.de
2019-08-28 13:01:26 +02:00
Sebastian Andrzej Siewior
902a9f9c50 tick: Mark tick related hrtimers to expiry in hard interrupt context
The tick related hrtimers, which drive the scheduler tick and hrtimer based
broadcasting are required to expire in hard interrupt context for obvious
reasons.

Mark them so PREEMPT_RT kernels wont move them to soft interrupt expiry.

Make the horribly formatted RCU_NONIDLE bracket maze readable while at it.

No functional change, 

[ tglx: Split out from larger combo patch. Add changelog ]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20190726185753.459144407@linutronix.de
2019-08-01 20:51:21 +02:00