msm-5.15

Author	SHA1	Message	Date
Waiman Long	8e1716993b	cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() commit 2685027fca387b602ae565bff17895188b803988 upstream. There are 3 places where the cpu and node masks of the top cpuset can be initialized in the order they are executed: 1) start_kernel -> cpuset_init() 2) start_kernel -> cgroup_init() -> cpuset_bind() 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp() The first cpuset_init() call just sets all the bits in the masks. The second cpuset_bind() call sets cpus_allowed and mems_allowed to the default v2 values. The third cpuset_init_smp() call sets them back to v1 values. For systems with cgroup v2 setup, cpuset_bind() is called once. As a result, cpu and memory node hot add may fail to update the cpu and node masks of the top cpuset to include the newly added cpu or node in a cgroup v2 environment. For systems with cgroup v1 setup, cpuset_bind() is called again by rebind_subsystem() when the v1 cpuset filesystem is mounted as shown in the dmesg log below with an instrumented kernel. [ 2.609781] cpuset_bind() called - v2 = 1 [ 3.079473] cpuset_init_smp() called [ 7.103710] cpuset_bind() called - v2 = 0 smp_init() is called after the first two init functions. So we don't have a complete list of active cpus and memory nodes until later in cpuset_init_smp() which is the right time to set up effective_cpus and effective_mems. To fix this cgroup v2 mask setup problem, the potentially incorrect cpus_allowed & mems_allowed setting in cpuset_init_smp() are removed. For cgroup v2 systems, the initial cpuset_bind() call will set the masks correctly. For cgroup v1 systems, the second call to cpuset_bind() will do the right setup. cc: stable@vger.kernel.org Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-18 10:26:56 +02:00
Greg Kroah-Hartman	2bde857bee	Merge 5.15.39 into android13-5.15 Changes in 5.15.39 MIPS: Fix CP0 counter erratum detection for R4k CPUs parisc: Merge model and model name into one line in /proc/cpuinfo ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13 gpiolib: of: fix bounds check for 'gpio-reserved-ranges' x86/fpu: Prevent FPU state corruption KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id iommu/vt-d: Calculate mask for non-aligned flushes iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range() drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 RISC-V: relocate DTB if it's outside memory region Revert "SUNRPC: attempt AF_LOCAL connect on setup" timekeeping: Mark NMI safe time accessors as notrace firewire: fix potential uaf in outbound_phy_packet_callback() firewire: remove check of list iterator against head past the loop body firewire: core: extend card->lock in fw_core_handle_bus_reset net: stmmac: disable Split Header (SPH) for Intel platforms genirq: Synchronize interrupt thread startup ASoC: da7219: Fix change notifications for tone generator frequency ASoC: wm8958: Fix change notifications for DSP controls ASoC: meson: Fix event generation for AUI ACODEC mux ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux s390/dasd: fix data corruption for ESE devices s390/dasd: prevent double format of tracks for ESE devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: Fix read inconsistency for ESE DASD devices can: grcan: grcan_close(): fix deadlock can: isotp: remove re-binding of bound socket can: grcan: use ofdev->dev when allocating DMA memory can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: only use the NAPI poll budget for RX nfc: replace improper check device_is_registered() in netlink related functions nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs NFC: netlink: fix sleep in atomic bug when firmware download timeout gpio: visconti: Fix fwnode of GPIO IRQ gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) hwmon: (adt7470) Fix warning on module removal hwmon: (pmbus) disable PEC if not enabled ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: soc-ops: fix error handling iommu/vt-d: Drop stop marker messages iommu/dart: check return value after calling platform_get_resource() net/mlx5e: Fix trust state reset in reload net/mlx5e: Don't match double-vlan packets if cvlan is not set net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release net/mlx5e: Fix the calling of update_buffer_lossy() API net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow selftests/seccomp: Don't call read() on TTY from background pgrp SUNRPC release the transport of a relocated task with an assigned transport RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Fix possible crash due to NULL netdev in notifier NFSv4: Don't invalidate inode attributes on delegation return net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() net: dsa: mt7530: add missing of_node_put() in mt7530_setup() net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller net: cpsw: add missing of_node_put() in cpsw_probe_dt() net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() net: emaclite: Add error handling for of_address_to_resource() selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems selftests/net: so_txtime: usage(): fix documentation of default clock drm/msm/dp: remove fail safe mode related code btrfs: do not BUG_ON() on failure to update inode when setting xattr hinic: fix bug of wq out of bound access mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() rxrpc: Enable IPv6 checksums on transport socket selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag bnxt_en: Fix unnecessary dropping of RX packets selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer smsc911x: allow using IRQ0 btrfs: force v2 space cache usage for subpage mount btrfs: always log symlinks in full mode drm/amdgpu: unify BO evicting method in amdgpu_ttm drm/amdgpu: explicitly check for s0ix when evicting resources drm/amdgpu: don't set s3 and s0ix at the same time drm/amdgpu: Ensure HDA function is suspended before ASIC reset gpio: mvebu: drop pwm base assignment kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU fbdev: Make fb_release() return -ENODEV if fbdev was unregistered net/mlx5: Fix slab-out-of-bounds while reading resource dump menu net/mlx5e: Lag, Fix use-after-free in fib event handler net/mlx5e: Lag, Fix fib_info pointer assignment net/mlx5e: Lag, Don't skip fib events on current dst iommu/dart: Add missing module owner to ops structure kvm: selftests: do not use bitfields larger than 32-bits for PTEs KVM: selftests: Silence compiler warning in the kvm_page_table_test x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume KVM: x86: Do not change ICR on write to APIC_SELF_IPI KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test mmc: rtsx: add 74 Clocks in power on flow Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized" rcu: Fix callbacks processing time limit retaining cond_resched() rcu: Apply callbacks processing time limit only on softirq PCI: pci-bridge-emul: Add description for class_revision field PCI: pci-bridge-emul: Add definitions for missing capabilities registers PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge PCI: aardvark: Clear all MSIs at setup PCI: aardvark: Comment actions in driver remove method PCI: aardvark: Disable bus mastering when unbinding driver PCI: aardvark: Mask all interrupts when unbinding driver PCI: aardvark: Fix memory leak in driver unbind PCI: aardvark: Assert PERST# when unbinding driver PCI: aardvark: Disable link training when unbinding driver PCI: aardvark: Disable common PHY when unbinding driver PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_* PCI: aardvark: Rewrite IRQ code to chained IRQ handler PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ PCI: aardvark: Make MSI irq_chip structures static driver structures PCI: aardvark: Make msi_domain_info structure a static driver structure PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) PCI: aardvark: Refactor unmasking summary MSI interrupt PCI: aardvark: Add support for masking MSI interrupts PCI: aardvark: Fix setting MSI address PCI: aardvark: Enable MSI-X support PCI: aardvark: Add support for ERR interrupt on emulated bridge PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge PCI: aardvark: Add support for PME interrupts PCI: aardvark: Fix support for PME requester on emulated bridge PCI: aardvark: Use separate INTA interrupt for emulated root bridge PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts PCI: aardvark: Don't mask irq when mapping PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy() PCI: aardvark: Update comment about link going down after link-up Linux 5.15.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic51d6d05a0d99156c6fd844786e984aff8e7386a	2022-05-18 09:38:58 +02:00
Greg Kroah-Hartman	ecda2085fd	Revert 5.15.37 merge into android13-5.15 This reverts the merge of 5.15.37 into the android13-5.15 There are lots of ABI issues, and many of the commits are not needed in the Android tree at this time. Revert the merge (except for the Makefile change), so that future merges will continue to work, and the needed individual changes from this release will be manually added to the tree at a later point in time. Fixes: f7dace75d276 ("Merge 5.15.37 into android13-5.15") Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0632858e5c0fb94fc14c0f4216997330eca260a7	2022-05-18 08:59:14 +02:00
Greg Kroah-Hartman	ef5fed3c1e	Merge 5.15.37 into android13-5.15 Changes in 5.15.37 floppy: disable FDRAWCMD by default bpf: Introduce composable reg, ret and arg types. bpf: Replace ARG_XXX_OR_NULL with ARG_XXX \| PTR_MAYBE_NULL bpf: Replace RET_XXX_OR_NULL with RET_XXX \| PTR_MAYBE_NULL bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL bpf: Introduce MEM_RDONLY flag bpf: Convert PTR_TO_MEM_OR_NULL to composable types. bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM. bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. bpf/selftests: Test PTR_TO_RDONLY_MEM bpf: Fix crash due to out of bounds access into reg2btf_ids. spi: cadence-quadspi: fix write completion support ARM: dts: socfpga: change qspi to "intel,socfpga-qspi" mm: kfence: fix objcgs vector allocation gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable iov_iter: Introduce fault_in_iov_iter_writeable gfs2: Add wrapper for iomap_file_buffered_write gfs2: Clean up function may_grant gfs2: Introduce flag for glock holder auto-demotion gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Eliminate ip->i_gh gfs2: Fix mmap + page fault deadlocks for buffered I/O iomap: Fix iomap_dio_rw return value for user copies iomap: Support partial direct I/O on user copy failures iomap: Add done_before argument to iomap_dio_rw gup: Introduce FOLL_NOFAULT flag to disable page faults iov_iter: Introduce nofault flag to disable page faults gfs2: Fix mmap + page fault deadlocks for direct I/O btrfs: fix deadlock due to page faults during direct IO reads and writes btrfs: fallback to blocking mode when doing async dio over multiple extents mm: gup: make fault_in_safe_writeable() use fixup_user_fault() selftests/bpf: Add test for reg2btf_ids out of bounds access Linux 5.15.37 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ica39b8856d6e3928a82f4e34f8b401f1a5cba5ee	2022-05-18 08:59:02 +02:00
Greg Kroah-Hartman	e95cdba8e2	Merge 5.15.36 into android13-5.15 Changes in 5.15.36 fs: remove __sync_filesystem block: remove __sync_blockdev block: simplify the block device syncing code vfs: make sync_filesystem return errors from ->sync_fs xfs: return errors in xfs_fs_sync_fs dma-mapping: remove bogus test for pfn_valid from dma_map_resource arm64/mm: drop HAVE_ARCH_PFN_VALID etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead mm: page_alloc: fix building error on -Werror=array-compare perf tools: Fix segfault accessing sample_id xyarray mm, kfence: support kmem_dump_obj() for KFENCE objects gfs2: assign rgrp glock before compute_bitstructs scsi: ufs: core: scsi_get_lba() error fix net/sched: cls_u32: fix netns refcount changes in u32_change() ALSA: usb-audio: Clear MIDI port active flag after draining ALSA: hda/realtek: Add quirk for Clevo NP70PNP ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek ASoC: topology: Correct error handling in soc_tplg_dapm_widget_create() ASoC: rk817: Use devm_clk_get() in rk817_platform_probe ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use dmaengine: idxd: fix device cleanup on disable dmaengine: imx-sdma: Fix error checking in sdma_event_remap dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources dmaengine: dw-edma: Fix unaligned 64bit access spi: spi-mtk-nor: initialize spi controller after resume esp: limit skb_page_frag_refill use to a single page spi: cadence-quadspi: fix incorrect supports_op() return value igc: Fix infinite loop in release_swfw_sync igc: Fix BUG: scheduling while atomic igc: Fix suspending when PTM is active ALSA: hda/hdmi: fix warning about PCM count when used with SOF rxrpc: Restore removed timer deletion net/smc: Fix sock leak when release after smc_shutdown() net/packet: fix packet_sock xmit return value checking ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() ip6_gre: Fix skb_under_panic in __gre6_xmit() net: restore alpha order to Ethernet devices in config net/sched: cls_u32: fix possible leak in u32_init_knode() l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu ipv6: make ip6_rt_gc_expire an atomic_t can: isotp: stop timeout monitoring when no first frame was sent net: dsa: hellcreek: Calculate checksums in tagger net: mscc: ocelot: fix broken IP multicast flooding netlink: reset network and mac headers in netlink_dump() drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails net: stmmac: Use readl_poll_timeout_atomic() in atomic state dmaengine: idxd: add RO check for wq max_batch_size write dmaengine: idxd: add RO check for wq max_transfer_size write dmaengine: idxd: skip clearing device context when device is read-only selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets arm64: mm: fix p?d_leaf() ARM: vexpress/spc: Avoid negative array index when !SMP reset: renesas: Check return value of reset_control_deassert() reset: tegra-bpmp: Restore Handle errors in BPMP response platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant drm/msm/disp: check the return value of kzalloc() arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes vxlan: fix error return code in vxlan_fdb_append cifs: Check the IOCB_DIRECT flag, not O_DIRECT net: atlantic: Avoid out-of-bounds indexing mt76: Fix undefined behavior due to shift overflowing the constant brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() drm/msm/mdp5: check the return of kzalloc() net: macb: Restart tx only if queue pointer is lagging scsi: iscsi: Release endpoint ID when its freed scsi: iscsi: Merge suspend fields scsi: iscsi: Fix NOP handling during conn recovery scsi: qedi: Fix failed disconnect handling stat: fix inconsistency between struct stat and struct compat_stat VFS: filename_create(): fix incorrect intent. nvme: add a quirk to disable namespace identifiers nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202 nvme-pci: disable namespace identifiers for Qemu controllers EDAC/synopsys: Read the error count from the correct register mm/memory-failure.c: skip huge_zero_page in memory_failure() memcg: sync flush only if periodic flush is delayed mm, hugetlb: allow for "high" userspace addresses oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() ata: pata_marvell: Check the 'bmdma_addr' beforing reading dma: at_xdmac: fix a missing check on list iterator dmaengine: imx-sdma: fix init of uart scripts net: atlantic: invert deep par in pm functions, preventing null derefs Input: omap4-keypad - fix pm_runtime_get_sync() error checking scsi: sr: Do not leak information in ioctl sched/pelt: Fix attach_entity_load_avg() corner case perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare KVM: PPC: Fix TCE handling for VFIO drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage powerpc/perf: Fix power9 event alternatives powerpc/perf: Fix power10 event alternatives perf script: Always allow field 'data_src' for auxtrace perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event xtensa: patch_text: Fixup last cpu should be master xtensa: fix a7 clobbering in coprocessor context load/store openvswitch: fix OOB access in reserve_sfa_size() gpio: Request interrupts after IRQ is initialized ASoC: soc-dapm: fix two incorrect uses of list iterator e1000e: Fix possible overflow in LTR decoding ARC: entry: fix syscall_trace_exit argument arm_pmu: Validate single/group leader events KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race KVM: nVMX: Defer APICv updates while L2 is active until L1 is active KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs netfilter: conntrack: convert to refcount_t api netfilter: conntrack: avoid useless indirection during conntrack destruction ext4: fix fallocate to use file_modified to update permissions consistently ext4: fix symlink file size not match to file content ext4: fix use-after-free in ext4_search_dir ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4, doc: fix incorrect h_reserved size ext4: fix overhead calculation to account for the reserved gdt blocks ext4: force overhead calculation if the s_overhead_cluster makes no sense netfilter: nft_ct: fix use after free when attaching zone template jbd2: fix a potential race while discarding reserved buffers after an abort spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller block/compat_ioctl: fix range check in BLKGETSIZE arm64: dts: qcom: add IPA qcom,qmp property Linux 5.15.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I44d3a4de9b6fa1d2016b4e063eb211e8373a1216	2022-05-18 08:55:59 +02:00
keystone-kernel-automerger	fed04ede18	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: FROMLIST: usb: dwc3: Fix ep0 handling when getting reset while doing control transfer ANDROID: ABI: Update symbols to unisoc whitelist for the 7st FROMLIST: media: Kconfig: Make DVB_CORE=m possible when MEDIA_SUPPORT=y ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: firmware_loader: Fix warning with firmware_param_path_set Revert "ANDROID: KVM: arm64: pkvm: Ensure that TLBs and I-cache are private to each vcpu" ANDROID: GCE: To build kernel image for gce cloud android. ANDROID: enable db845c kleaf build. Revert "ANDROID: Make file-backed vma teardown synchronous" ANDROID: GCE: To build kernel image for gce cloud android. FROMLIST: serial: qcom_geni_serial: Disable MMIO tracing for geni serial ANDROID: kernel: fix debug_kinfo set twice crash issue UPSTREAM: media: v4l2-ctrls: Add RGB color effects control ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list ANDROID: usb: export tracepoint for dwc3_complete_trb ANDROID: cputime: seprate irq entry and exit tracehooks Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I1f3900dd336d9fb20a0437ebbb48e849f8133640	2022-05-16 06:18:07 +00:00
Steve Muckle	23ef56f65c	Revert "ANDROID: Make file-backed vma teardown synchronous" This reverts commit `fe25fc5375`. Reason for revert: test regressions Bug: 232427425 Bug: 232421416 Change-Id: If7006fe6c3f1a55361099ef24927d4bc7c821b9c Signed-off-by: Steve Muckle <smuckle@google.com>	2022-05-13 15:42:24 +00:00
Frederic Weisbecker	0060c7bd9e	rcu: Apply callbacks processing time limit only on softirq commit a554ba288845fd3f6f12311fd76a51694233458a upstream. Time limit only makes sense when callbacks are serviced in softirq mode because: _ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback. _ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq(). Therefore, make sure the time limit only applies to softirq mode. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:26 +02:00
Frederic Weisbecker	2c5029d652	rcu: Fix callbacks processing time limit retaining cond_resched() commit 3e61e95e2d095e308616cba4ffb640f95a480e01 upstream. The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue. However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long... Make sure the scheduler has a higher priority than the time limit. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable + commit update] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:26 +02:00
Thomas Pfaff	61808e4089	genirq: Synchronize interrupt thread startup commit 8707898e22fd665bc1d7b18b809be4b56ce25bdd upstream. A kernel hang can be observed when running setserial in a loop on a kernel with force threaded interrupts. The sequence of events is: setserial open("/dev/ttyXXX") request_irq() do_stuff() -> serial interrupt -> wake(irq_thread) desc->threads_active++; close() free_irq() kthread_stop(irq_thread) synchronize_irq() <- hangs because desc->threads_active != 0 The thread is created in request_irq() and woken up, but does not get on a CPU to reach the actual thread function, which would handle the pending wake-up. kthread_stop() sets the should stop condition which makes the thread immediately exit, which in turn leaves the stale threads_active count around. This problem was introduced with commit `519cc8652b`, which addressed a interrupt sharing issue in the PCIe code. Before that commit free_irq() invoked synchronize_irq(), which waits for the hard interrupt handler and also for associated threads to complete. To address the PCIe issue synchronize_irq() was replaced with __synchronize_hardirq(), which only waits for the hard interrupt handler to complete, but not for threaded handlers. This was done under the assumption, that the interrupt thread already reached the thread function and waits for a wake-up, which is guaranteed to be handled before acting on the stop condition. The problematic case, that the thread would not reach the thread function, was obviously overlooked. Make sure that the interrupt thread is really started and reaches thread_fn() before returning from __setup_irq(). This utilizes the existing wait queue in the interrupt descriptor. The wait queue is unused for non-shared interrupts. For shared interrupts the usage might cause a spurious wake-up of a waiter in synchronize_irq() or the completion of a threaded handler might cause a spurious wake-up of the waiter for the ready flag. Both are harmless and have no functional impact. [ tglx: Amended changelog ] Fixes: `519cc8652b` ("genirq: Synchronize only with single thread on free_irq()") Signed-off-by: Thomas Pfaff <tpfaff@pcs.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:06 +02:00
Kurt Kanzenbach	07adb69545	timekeeping: Mark NMI safe time accessors as notrace commit 2c33d775ef4c25c0e1e1cc0fd5496d02f76bfa20 upstream. Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are used in tracing to retrieve timestamps, so they should not recurse. Fixes: `4498e7467e` ("time: Parametrize all tk_fast_mono users") Fixes: `f09cb9a180` ("time: Introduce tk_fast_raw") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220426175338.3807ca4f@gandalf.local.home/ Link: https://lore.kernel.org/r/20220428062432.61063-1-kurt@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-12 12:30:04 +02:00
keystone-kernel-automerger	82b7f3fbf1	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: (22 commits) ANDROID: ABI: Update pixel symbol list and ABI xml ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: Make file-backed vma teardown synchronous ANDROID: abi_gki_aarch64_qcom: Add icc_sync_state ANDROID: ABI: Update symbols to unisoc whitelist for the 6th ANDROID: Update symbol list for mtk ANDROID: abi_gki_aarch64_qcom: Update symbol list. UPSTREAM: printk: ringbuffer: Improve prb_next_seq() performance ANDROID: firmware_loader: Add support for customer firmware paths FROMLIST: remoteproc: Use unbounded workqueue for recovery work ANDROID: kbuild: mod: Move $(obj) prefix inside awk FROMGIT: kbuild: read .mod to get objects passed to $(LD) or $(AR) FROMGIT: kbuild: make .mod not depend on .o FROMGIT: kbuild: get rid of duplication in .mod files FROMGIT: kbuild: split the second line of .mod into .usyms FROMGIT: kbuild: reuse real-search to simplify cmd_mod FROMGIT: kbuild: make multi_depend work with targets in subdirectory FROMGIT: kbuild: reuse suffix-search to refactor multi_depend ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: ABI: Update symbols to unisoc whitelist for the 5th ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I6bc401372db3c8a5bb2714630b98a2ea326aa606	2022-05-12 06:18:03 +00:00
Shaleen Agrawal	aec40de3d7	ANDROID: cputime: seprate irq entry and exit tracehooks Currently the code has single hook for tracking irqs. However modules need to deduce start and end of the irq. Create separate hooks for irq start and end since the cputime has already figured it out. Bug: 231341763 Change-Id: Ie0dd503b283d83f69d01171ebd1cd6127c3bafd0 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>	2022-05-12 10:02:42 +05:30
Suren Baghdasaryan	fe25fc5375	ANDROID: Make file-backed vma teardown synchronous When a file-backed vma is being released, the userspace can have an expectation that the vma and the file it's pinning will be released synchronously. This does not happen when SPF is enabled because vma and associated file are released asynchronously after RCU grace period. This is done to prevent pagefault handler from stepping on a deleted object. Fix this issue by synchronously waiting for RCU grace period during file-backed vma tear-down. Fixes: `48e35d053f` "FROMLIST: mm: rcu safe vma->vm_file freeing" Bug: 231394031 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9f672d5bd947763c7d180a8c1b1f964600d407f3	2022-05-11 17:08:16 +00:00
Petr Mladek	a327e05923	UPSTREAM: printk: ringbuffer: Improve prb_next_seq() performance prb_next_seq() always iterates from the first known sequence number. In the worst case, it might loop 8k times for 256kB buffer, 15k times for 512kB buffer, and 64k times for 2MB buffer. It was reported that polling and reading using syslog interface might occupy 50% of CPU. Speedup the search by storing @id of the last finalized descriptor. The loop is still needed because the @id is stored and read in the best effort way. An atomic variable is used to keep the @id consistent. But the stores and reads are not serialized against each other. The descriptor could get reused in the meantime. The related sequence number will be used only when it is still valid. An invalid value should be read _only_ when there is a flood of messages and the ringbuffer is rapidly reused. The performance is the least problem in this case. Bug: 216238044 (cherry picked from commit ef244b4dc53e520d4570b2610436aba0593ce6f55) Reported-by: Chunlei Wang <chunlei.wang@mediatek.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/1642770388-17327-1-git-send-email-quic_mojha@quicinc.com Link: https://lore.kernel.org/lkml/YXlddJxLh77DKfIO@alley/T/#m43062e8b2a17f8dbc8c6ccdb8851fb0dbaabbb14 Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com> Change-Id: Ie6b1276eca791a891e42d5635ca1f116ae7cadef Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com>	2022-05-10 23:41:39 +00:00
keystone-kernel-automerger	989c61a63c	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list FROMGIT: net: fix wrong network header length ANDROID: Update mtktv symbol list 3rd ANDROID: arm: Mark the recheduling IPI as raw interrupt ANDROID: arm64: Mark the recheduling IPI as raw interrupt ANDROID: genirq: Allow an interrupt to be marked as 'raw' ANDROID: vendor_hooks: Add hooks for ufs scheduler FROMLIST: soc: qcom: geni: Disable MMIO tracing for GENI SE ANDROID: ABI: Update symbols to unisoc whitelist for the 4st ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: abi_gki_aarch64_qcom: Sort symbol list Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I52045ea99a1a29b8bc2a089856e74517b637ed0e	2022-05-10 06:18:07 +00:00
Marc Zyngier	a879ad2ff0	ANDROID: genirq: Allow an interrupt to be marked as 'raw' Some interrupts (such as the rescheduling IPI) rely upon not going through the irq_enter()/irq_exit() calls. To distinguish such interrupts, add a new IRQ flag that allows the low-level handling code to sidestep the enter()/exit() calls. Only the architecture code is expected to use this. It will do the wrong thing on normal interrupts. Note that this is a band-aid until we can move to some more correct infrastructure (such as kernel/entry/common.c). Bug: 191808738 Link: https://lore.kernel.org/lkml/20201124141449.572446-3-maz@kernel.org/ Change-Id: I0609a8b689219ba9e769c8b9f7fcf1e77a0ff1ca Signed-off-by: Marc Zyngier <maz@kernel.org> [minor port to 5.10] Signed-off-by: Stephen Dickey <dickey@codeaurora.org> [minor port to 5.15] Signed-off-by: Abhijeet Dharmapurikar <quic_adharmap@quicinc.com>	2022-05-09 22:34:00 +00:00
keystone-kernel-automerger	9284fe85c5	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: (32 commits) ANDROID: GKI: 5/4/2022 KMI update ANDROID: GKI: add mem_section to pixel's symbol list ANDROID: net: introduce ip_local_unbindable_ports sysctl ANDROID: sched: Add flags parameter to enq/deq after tracehooks ANDROID: GKI: Remove pfn_valid symbol BACKPORT: arm64/mm: drop HAVE_ARCH_PFN_VALID BACKPORT: dma-mapping: remove bogus test for pfn_valid from dma_map_resource ANDROID: add kabi padding for structures for the android13 release ANDROID: GKI: device.h: add Android ABI padding to some structures ANDROID: GKI: elevator: add Android ABI padding to some structures ANDROID: GKI: scsi: add Android ABI padding to some structures ANDROID: GKI: workqueue.h: add Android ABI padding to some structures ANDROID: GKI: sched: add Android ABI padding to some structures ANDROID: GKI: phy: add Android ABI padding to some structures ANDROID: GKI: fs.h: add Android ABI padding to some structures ANDROID: GKI: dentry: add Android ABI padding to some structures ANDROID: GKI: bio: add Android ABI padding to some structures ANDROID: GKI: ufs: add Android ABI padding to some structures Revert "Revert "opp: Expose of-node's name in debugfs"" Revert "Revert "gpio: Restrict usage of GPIO chip irq members before initialization"" ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I75a673637b41f5cecaf691f86b2103f2ad3b5110	2022-05-05 06:18:26 +00:00
Shaleen Agrawal	7045a3408d	ANDROID: sched: Add flags parameter to enq/deq after tracehooks Currently, the enqueue and dequeue tracehooks pass in the flags parameter, however, the after tracehooks that follow do not. Bug: 226570047 Change-Id: I51cb50054562893271e5d3efd7c6bd028977622d Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>	2022-05-04 13:39:14 -07:00
Mike Rapoport	565ca36055	BACKPORT: dma-mapping: remove bogus test for pfn_valid from dma_map_resource dma_map_resource() uses pfn_valid() to ensure the range is not RAM. However, pfn_valid() only checks for availability of the memory map for a PFN but it does not ensure that the PFN is actually backed by RAM. As dma_map_resource() is the only method in DMA mapping APIs that has this check, simply drop the pfn_valid() test from dma_map_resource(). Link: https://lore.kernel.org/all/20210824173741.GC623@arm.com/ Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20210930013039.11260-2-rppt@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Bug: 228454859 (cherry picked from commit a9c38c5d267cb94871dfa2de5539c92025c855d7) Change-Id: Iae9d77b018d728a0cfaf5f4fd7e14311aa463659 Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com>	2022-05-04 13:39:14 -07:00
Kumar Kartikeya Dwivedi	8c39925e98	bpf: Fix crash due to out of bounds access into reg2btf_ids. commit 45ce4b4f9009102cd9f581196d480a59208690c1 upstream When commit `e6ac2450d6` ("bpf: Support bpf program calling kernel function") added kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier reg type to the appropriate btf_vmlinux BTF ID, however commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL") moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after the base register types, and defined other variants using type flag composition. However, now, the direct usage of reg->type to index into reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to out of bounds access and kernel crash on dereference of bad pointer. [backport note: commit 3363bd0cfbb80 ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support") was introduced after 5.15 and contains an out of bound reg2btf_ids access. Since that commit hasn't been backported, this patch doesn't include fix to that access. If we backport that commit in future, we need to fix its faulting access as well.] Fixes: c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220216201943.624869-1-memxor@gmail.com Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:26 +02:00
Hao Luo	2a77c58726	bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. commit 216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20 upstream. Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:26 +02:00
Hao Luo	15166bb300	bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM. commit 34d3a78c681e8e7844b43d1a2f4671a04249c821 upstream. Tag the return type of {per, this}_cpu_ptr with RDONLY_MEM. The returned value of this pair of helpers is kernel object, which can not be updated by bpf programs. Previously these two helpers return PTR_OT_MEM for kernel objects of scalar type, which allows one to directly modify the memory. Now with RDONLY_MEM tagging, the verifier will reject programs that write into RDONLY_MEM. Fixes: `63d9b80dcf` ("bpf: Introducte bpf_this_cpu_ptr()") Fixes: `eaa6bcb71e` ("bpf: Introduce bpf_per_cpu_ptr()") Fixes: `4976b718c3` ("bpf: Introduce pseudo_btf_id") Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-8-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:25 +02:00
Hao Luo	b710f73704	bpf: Convert PTR_TO_MEM_OR_NULL to composable types. commit cf9f2f8d62eca810afbd1ee6cc0800202b000e57 upstream. Remove PTR_TO_MEM_OR_NULL and replace it with PTR_TO_MEM combined with flag PTR_MAYBE_NULL. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-7-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:25 +02:00
Hao Luo	b453361384	bpf: Introduce MEM_RDONLY flag commit 20b2aff4bc15bda809f994761d5719827d66c0b4 upstream. This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF \| MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:24 +02:00
Hao Luo	8d38cde47a	bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX \| PTR_MAYBE_NULL commit c25b2ae136039ffa820c26138ed4a5e5f3ab3841 upstream. We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL [haoluo: backport notes There was a reg_type_may_be_null() in adjust_ptr_min_max_vals() in 5.15.x, but didn't exist in the upstream commit. This backport converted that reg_type_may_be_null() to type_may_be_null() as well.] Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:24 +02:00
Hao Luo	3c141c82b9	bpf: Replace RET_XXX_OR_NULL with RET_XXX \| PTR_MAYBE_NULL commit 3c4807322660d4290ac9062c034aed6b87243861 upstream. We have introduced a new type to make bpf_ret composable, by reserving high bits to represent flags. One of the flag is PTR_MAYBE_NULL, which indicates a pointer may be NULL. When applying this flag to ret_types, it means the returned value could be a NULL pointer. This patch switches the qualified arg_types to use this flag. The ret_types changed in this patch include: 1. RET_PTR_TO_MAP_VALUE_OR_NULL 2. RET_PTR_TO_SOCKET_OR_NULL 3. RET_PTR_TO_TCP_SOCK_OR_NULL 4. RET_PTR_TO_SOCK_COMMON_OR_NULL 5. RET_PTR_TO_ALLOC_MEM_OR_NULL 6. RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL 7. RET_PTR_TO_BTF_ID_OR_NULL This patch doesn't eliminate the use of these names, instead it makes them aliases to 'RET_PTR_TO_XXX \| PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-4-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:23 +02:00
Hao Luo	d58a396fa6	bpf: Replace ARG_XXX_OR_NULL with ARG_XXX \| PTR_MAYBE_NULL commit 48946bd6a5d695c50b34546864b79c1f910a33c1 upstream. We have introduced a new type to make bpf_arg composable, by reserving high bits of bpf_arg to represent flags of a type. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. When applying this flag to an arg_type, it means the arg can take NULL pointer. This patch switches the qualified arg_types to use this flag. The arg_types changed in this patch include: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. ARG_PTR_TO_MEM_OR_NULL 3. ARG_PTR_TO_CTX_OR_NULL 4. ARG_PTR_TO_SOCKET_OR_NULL 5. ARG_PTR_TO_ALLOC_MEM_OR_NULL 6. ARG_PTR_TO_STACK_OR_NULL This patch does not eliminate the use of these arg_types, instead it makes them an alias to the 'ARG_XXX \| PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-3-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-05-01 17:22:23 +02:00
keystone-kernel-automerger	d0510ff5a4	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: ANDROID: sched/rt: Only enable RT sync for SMP targets ANDROID: sched/rt: Add support for rt sync wakeups ANDROID: irq: manage: Export irq_do_set_affinity symbol ANDROID: gic-v3: Update vendor hook to set affinity in GIC v3 FROMGIT: scsi: ufs: core: Exclude UECxx from SFR dump list ANDROID: GKI: add initial symbol list for Exynos Auto SoC ANDROID: arm64: Auto-enroll MMIO guard on protected vms ANDROID: mm: Export vmalloc_nr_pages ANDROID: mm: Export pcpu_nr_pages UPSTREAM: f2fs: should not truncate blocks during roll-forward recovery ANDROID: add gki_module headers to .gitignore file ANDROID: usb: add EXPORT_TRACE_SYMBOL to export tracepoint Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: Ib9828714fa460cc6f6e4519f1f2a2c970302763d	2022-04-28 06:18:07 +00:00
J. Avila	def3ff8c55	ANDROID: sched/rt: Only enable RT sync for SMP targets The rt sync wakeup support has a condition which relies on a field that exists only when CONFIG_SMP is defined, causing a compilation issue. Since sync wakeup has no real meaning on a non-SMP system, we can just drop the CONFIG_RT_GROUP_SCHED part of the #ifdef. Fixes: da5f3cd37802 ("ANDROID: sched/rt: Add support for rt sync wakeups") Signed-off-by: J. Avila <elavila@google.com> Change-Id: I9b95304408d323b0c1017bd33746ecfbb2b35808	2022-04-28 01:01:01 +00:00
J. Avila	a7133dd750	ANDROID: sched/rt: Add support for rt sync wakeups Some rt tasks undergo sync wakeup. Currently, these tasks will be placed on other, often sleeping or otherwise idle CPUs, which can lead to unnecessary power hits. Bug: 157906395 Change-Id: I48864d0847bbe4f7813c842032880ad3f3b8b06b Signed-off-by: J. Avila <elavila@google.com> [quic_dickey@quicinc.com: Port to 5.15] Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-04-28 01:00:53 +00:00
Neeraj Upadhyay	9f7014a6d2	ANDROID: irq: manage: Export irq_do_set_affinity symbol Vendor kernel modules may implement irq balancers, which could take irq desc lock of an irq and then based on current affinity mask or affinity hint, reconfigure the affinity of that irq. For example : For an irq, for which affinity is broken i.e. all the cpus in its affinity mask have gone offline. For such irqs, we might want to reset the affinity, when the original set of affined cpus, come back online. desc->affinity_hint can be used for figuring out the original affinity. So, the sequence for doing this becomes: desc = irq_to_desc(i); raw_spin_lock(&desc->lock); affinity = desc->affinity_hint; raw_spin_unlock(&desc->lock); irq_set_affinity_hint(i, affinity); Here, we need to release the desc lock before calling the exported api irq_set_affinity_hint(). This creates a window where, after unlocking desc lock and before calling irq_set_affinity_hint(), where this setting can race with other irq_set_affinity_hint() callers. So, export irq_do_set_affinity() symbol to provide an api, which can be called with desc lock held. Bug: 187157600 Change-Id: Ifad88bfaa1e7eec09c3fe5a9dd7d1d421362b41e Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> (cherry picked from commit d88c1e77fd574f694540c48a67e210aa6cfea01f)	2022-04-27 23:20:00 +00:00
Greg Kroah-Hartman	d7f3794b34	ANDROID: add gki_module headers to .gitignore file In commit `f8bd6cf70d` ("ANDROID: GKI: Add module load time protected symbol lookup") the kernel/gki_module_exported.h and kernel/gki_module_protected.h files are created, but these generated files are not added to the .gitignore file, making them show up as added files when building the tree. Resolve this by adding them to the proper .gitignore file Bug: 200082547 Fixes: `f8bd6cf70d` ("ANDROID: GKI: Add module load time protected symbol lookup") Cc: Ramji Jiyani <ramjiyani@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I906ddd24bfc54d62f572fba5491e2d2025325957	2022-04-27 15:34:29 +00:00
Zhipeng Xie	56637084e8	perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled [ Upstream commit 60490e7966659b26d74bf1fa4aa8693d9a94ca88 ] This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1]. sysdig -B works fine after rebuilding the kernel with CONFIG_PERF_USE_VMALLOC disabled. I tracked it down to the if condition event->rb->nr_pages != nr_pages in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is enabled, rb->nr_pages is always equal to 1. Arch with CONFIG_PERF_USE_VMALLOC enabled by default: arc/arm/csky/mips/sh/sparc/xtensa Arch with CONFIG_PERF_USE_VMALLOC disabled by default: x86_64/aarch64/... Fix this problem by using data_page_nr() [1] https://github.com/draios/sysdig Fixes: `906010b213` ("perf_event: Provide vmalloc() based mmap() backing") Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-27 14:38:58 +02:00
kuyo chang	b1b9294682	sched/pelt: Fix attach_entity_load_avg() corner case [ Upstream commit 40f5aa4c5eaebfeaca4566217cb9c468e28ed682 ] The warning in cfs_rq_is_decayed() triggered: SCHED_WARN_ON(cfs_rq->avg.load_avg \|\| cfs_rq->avg.util_avg \|\| cfs_rq->avg.runnable_avg) There exists a corner case in attach_entity_load_avg() which will cause load_sum to be zero while load_avg will not be. Consider se_weight is 88761 as per the sched_prio_to_weight[] table. Further assume the get_pelt_divider() is 47742, this gives: se->avg.load_avg is 1. However, calculating load_sum: se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); se->avg.load_sum = 147742/88761 = 0. Then enqueue_load_avg() adds this to the cfs_rq totals: cfs_rq->avg.load_avg += se->avg.load_avg; cfs_rq->avg.load_sum += se_weight(se) se->avg.load_sum; Resulting in load_avg being 1 with load_sum is 0, which will trigger the WARN. Fixes: `f207934fb7` ("sched/fair: Align PELT windows between cfs_rq and its se") Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> [peterz: massage changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20220414090229.342-1-kuyo.chang@mediatek.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-27 14:38:58 +02:00
Mike Rapoport	c01430cf5b	dma-mapping: remove bogus test for pfn_valid from dma_map_resource commit a9c38c5d267cb94871dfa2de5539c92025c855d7 upstream. dma_map_resource() uses pfn_valid() to ensure the range is not RAM. However, pfn_valid() only checks for availability of the memory map for a PFN but it does not ensure that the PFN is actually backed by RAM. As dma_map_resource() is the only method in DMA mapping APIs that has this check, simply drop the pfn_valid() test from dma_map_resource(). Link: https://lore.kernel.org/all/20210824173741.GC623@arm.com/ Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20210930013039.11260-2-rppt@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Fixes: `859a85ddf9` ("mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE") Link: https://lore.kernel.org/r/Yl0IZWT2nsiYtqBT@linux.ibm.com Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-27 14:38:50 +02:00
keystone-kernel-automerger	4fac540897	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: (471 commits) ANDROID: usb: add EXPORT_TRACE_SYMBOL to export tracepoint Revert "gpio: Restrict usage of GPIO chip irq members before initialization" Revert "opp: Expose of-node's name in debugfs" ANDROID: GKI: remove CONFIG_UBSAN_OBJECT_SIZE from gki_defconfig Linux 5.15.35 ax25: Fix UAF bugs in ax25 timers ax25: Fix NULL pointer dereferences in ax25 timers ax25: fix NPD bug in ax25_disconnect ax25: fix UAF bug in ax25_send_control() ax25: Fix refcount leaks caused by ax25_cb_del() ax25: fix UAF bugs of net_device caused by rebinding operation ax25: fix reference count leaks of ax25_dev ax25: add refcount in ax25_dev to avoid UAF bugs cpufreq: intel_pstate: ITMT support for overclocked system net: ipa: fix a build dependency soc: qcom: aoss: Fix missing put_device call in qmp_get cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL dma-direct: avoid redundant memory sync for swiotlb timers: Fix warning condition in __run_timers() ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I44c7de3b006aca8e2bf7653fbac1c875aaa80b99	2022-04-26 06:17:31 +00:00
Greg Kroah-Hartman	ec1a28c7c0	Merge 5.15.35 into android13-5.15 Changes in 5.15.35 drm/amd/display: Add pstate verification and recovery for DCN31 drm/amd/display: Fix p-state allow debug index on dcn31 hamradio: defer 6pack kfree after unregister_netdev hamradio: remove needs_free_netdev to avoid UAF cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function ACPI: processor idle: Check for architectural support for LPI ACPI: processor idle: Allow playing dead in C3 state ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40 btrfs: remove unused parameter nr_pages in add_ra_bio_pages() btrfs: remove no longer used counter when reading data page btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups() soc: qcom: aoss: Expose send for generic usecase dt-bindings: net: qcom,ipa: add optional qcom,qmp property net: ipa: request IPA register values be retained btrfs: release correct delalloc amount in direct IO write path ALSA: core: Add snd_card_free_on_error() helper ALSA: sis7019: Fix the missing error handling ALSA: ali5451: Fix the missing snd_card_free() call at probe error ALSA: als300: Fix the missing snd_card_free() call at probe error ALSA: als4000: Fix the missing snd_card_free() call at probe error ALSA: atiixp: Fix the missing snd_card_free() call at probe error ALSA: au88x0: Fix the missing snd_card_free() call at probe error ALSA: aw2: Fix the missing snd_card_free() call at probe error ALSA: azt3328: Fix the missing snd_card_free() call at probe error ALSA: bt87x: Fix the missing snd_card_free() call at probe error ALSA: ca0106: Fix the missing snd_card_free() call at probe error ALSA: cmipci: Fix the missing snd_card_free() call at probe error ALSA: cs4281: Fix the missing snd_card_free() call at probe error ALSA: cs5535audio: Fix the missing snd_card_free() call at probe error ALSA: echoaudio: Fix the missing snd_card_free() call at probe error ALSA: emu10k1x: Fix the missing snd_card_free() call at probe error ALSA: ens137x: Fix the missing snd_card_free() call at probe error ALSA: es1938: Fix the missing snd_card_free() call at probe error ALSA: es1968: Fix the missing snd_card_free() call at probe error ALSA: fm801: Fix the missing snd_card_free() call at probe error ALSA: galaxy: Fix the missing snd_card_free() call at probe error ALSA: hdsp: Fix the missing snd_card_free() call at probe error ALSA: hdspm: Fix the missing snd_card_free() call at probe error ALSA: ice1724: Fix the missing snd_card_free() call at probe error ALSA: intel8x0: Fix the missing snd_card_free() call at probe error ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error ALSA: korg1212: Fix the missing snd_card_free() call at probe error ALSA: lola: Fix the missing snd_card_free() call at probe error ALSA: lx6464es: Fix the missing snd_card_free() call at probe error ALSA: maestro3: Fix the missing snd_card_free() call at probe error ALSA: oxygen: Fix the missing snd_card_free() call at probe error ALSA: riptide: Fix the missing snd_card_free() call at probe error ALSA: rme32: Fix the missing snd_card_free() call at probe error ALSA: rme9652: Fix the missing snd_card_free() call at probe error ALSA: rme96: Fix the missing snd_card_free() call at probe error ALSA: sc6000: Fix the missing snd_card_free() call at probe error ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error ALSA: via82xx: Fix the missing snd_card_free() call at probe error ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb ALSA: nm256: Don't call card private_free at probe error path drm/msm: Add missing put_task_struct() in debugfs path firmware: arm_scmi: Remove clear channel call on the TX channel memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe Revert "ath11k: mesh: add support for 256 bitmap in blockack frames in 11ax" firmware: arm_scmi: Fix sorting of retrieved clock rates media: rockchip/rga: do proper error checking in probe SUNRPC: Fix the svc_deferred_event trace class net/sched: flower: fix parsing of ethertype following VLAN header veth: Ensure eth header is in skb's linear part gpiolib: acpi: use correct format characters cifs: release cached dentries only if mount is complete net: mdio: don't defer probe forever if PHY IRQ provider is missing mlxsw: i2c: Fix initialization error flow net/sched: fix initialization order when updating chain 0 head net: dsa: felix: suppress -EPROBE_DEFER errors net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link net/sched: taprio: Check if socket flags are valid cfg80211: hold bss_lock while updating nontrans_list netfilter: nft_socket: make cgroup match work in input too drm/msm: Fix range size vs end confusion drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init() drm/msm/dp: add fail safe mode outside of event_mutex context net/smc: Fix NULL pointer dereference in smc_pnet_find_ib() scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63 scsi: pm80xx: Enable upper inbound, outbound queues scsi: iscsi: Move iscsi_ep_disconnect() scsi: iscsi: Fix offload conn cleanup when iscsid restarts scsi: iscsi: Fix endpoint reuse regression scsi: iscsi: Fix conn cleanup and stop race during iscsid restart scsi: iscsi: Fix unbound endpoint error handling sctp: Initialize daddr on peeled off socket netfilter: nf_tables: nft_parse_register can return a negative value ALSA: ad1889: Fix the missing snd_card_free() call at probe error ALSA: mtpav: Don't call card private_free at probe error path io_uring: move io_uring_rsrc_update2 validation io_uring: verify that resv2 is 0 in io_uring_rsrc_update2 io_uring: verify pad field is 0 in io_get_ext_arg testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set ALSA: usb-audio: Increase max buffer size ALSA: usb-audio: Limit max buffer and period sizes per time perf tools: Fix misleading add event PMU debug message macvlan: Fix leaking skb in source mode with nodst option net: ftgmac100: access hardware register after clock ready nfc: nci: add flush_workqueue to prevent uaf cifs: potential buffer overflow in handling symlinks dm mpath: only use ktime_get_ns() in historical selector vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used net: bcmgenet: Revert "Use stronger register read/writes to assure ordering" block: fix offset/size check in bio_trim() drm/amd: Add USBC connector ID btrfs: fix fallocate to use file_modified to update permissions consistently btrfs: do not warn for free space inode in cow_file_range drm/amdgpu: conduct a proper cleanup of PDB bo drm/amdgpu/gmc: use PCI BARs for APUs in passthrough drm/amd/display: fix audio format not updated after edid updated drm/amd/display: FEC check in timing validation drm/amd/display: Update VTEM Infopacket definition drm/amdkfd: Fix Incorrect VMIDs passed to HWS drm/amdgpu/vcn: improve vcn dpg stop procedure drm/amdkfd: Check for potential null return of kmalloc_array() Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests PCI: hv: Propagate coherence from VMbus device to PCI device Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer scsi: target: tcmu: Fix possible page UAF scsi: lpfc: Fix queue failures when recovering from PCI parity error scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024 net: micrel: fix KS8851_MLL Kconfig ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs gpu: ipu-v3: Fix dev_dbg frequency output regulator: wm8994: Add an off-on delay for WM8994 variant arm64: alternatives: mark patch_alternative() as `noinstr` tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry net: axienet: setup mdio unconditionally Drivers: hv: balloon: Disable balloon and hot-add accordingly net: usb: aqc111: Fix out-of-bounds accesses in RX fixup myri10ge: fix an incorrect free for skb in myri10ge_sw_tso spi: cadence-quadspi: fix protocol setup for non-1-1-X operations drm/amd/display: Enable power gating before init_pipes drm/amd/display: Revert FEC check in validation drm/amd/display: Fix allocate_mst_payload assert on resume drbd: set QUEUE_FLAG_STABLE_WRITES scsi: mpt3sas: Fail reset operation if config request timed out scsi: mvsas: Add PCI ID of RocketRaid 2640 scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan drivers: net: slip: fix NPD bug in sl_tx_timeout() io_uring: zero tag on rsrc removal io_uring: use nospec annotation for more indexes perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant mm/secretmem: fix panic when growing a memfd_secret mm, page_alloc: fix build_zonerefs_node() mm: fix unexpected zeroed page mapping with zram swap mm: kmemleak: take a full lowmem check in kmemleak_*_phys() KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded SUNRPC: Fix NFSD's request deferral on RDMA transports memory: renesas-rpc-if: fix platform-device leak in error path gcc-plugins: latent_entropy: use /dev/urandom cifs: verify that tcon is valid before dereference in cifs_kill_sb ath9k: Properly clear TX status area before reporting to mac80211 ath9k: Fix usage of driver-private space in tx_info btrfs: fix root ref counts in error handling in btrfs_get_root_ref btrfs: mark resumed async balance as writing ALSA: hda/realtek: Add quirk for Clevo PD50PNT ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers ALSA: pcm: Test for "silence" field in struct "pcm_format_data" nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size ipv6: fix panic when forwarding a pkt with no in6 dev drm/amd/display: don't ignore alpha property on pre-multiplied mode drm/amdgpu: Enable gfxoff quirk on MacBook Pro x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits x86/tsx: Disable TSX development mode at boot genirq/affinity: Consider that CPUs on nodes can be unbalanced tick/nohz: Use WARN_ON_ONCE() to prevent console saturation ARM: davinci: da850-evm: Avoid NULL pointer dereference dm integrity: fix memory corruption when tag_size is less than digest size i2c: dev: check return value when calling dev_set_name() smp: Fix offline cpu check in flush_smp_call_function_queue() i2c: pasemi: Wait for write xfers to finish dt-bindings: net: snps: remove duplicate name timers: Fix warning condition in __run_timers() dma-direct: avoid redundant memory sync for swiotlb drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state soc: qcom: aoss: Fix missing put_device call in qmp_get net: ipa: fix a build dependency cpufreq: intel_pstate: ITMT support for overclocked system ax25: add refcount in ax25_dev to avoid UAF bugs ax25: fix reference count leaks of ax25_dev ax25: fix UAF bugs of net_device caused by rebinding operation ax25: Fix refcount leaks caused by ax25_cb_del() ax25: fix UAF bug in ax25_send_control() ax25: fix NPD bug in ax25_disconnect ax25: Fix NULL pointer dereferences in ax25 timers ax25: Fix UAF bugs in ax25 timers Linux 5.15.35 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0dd9eaea7f977df42b0a5b9cb9043c879f62718b	2022-04-24 16:58:59 +02:00
Greg Kroah-Hartman	33f5d1daec	Merge 5.15.34 into android13-5.15 Changes in 5.15.34 lib/logic_iomem: correct fallback config references um: fix and optimize xor select template for CONFIG64 and timetravel mode rtc: wm8350: Handle error for wm8350_register_irq nbd: add error handling support for add_disk() nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add nbd: Fix hungtask when nbd_config_put nbd: fix possible overflow on 'first_minor' in nbd_dev_add() kfence: count unexpectedly skipped allocations kfence: move saving stack trace of allocations into __kfence_alloc() kfence: limit currently covered allocations when pool nearly full KVM: x86/pmu: Use different raw event masks for AMD and Intel KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86/emulator: Emulate RDPID only if it is enabled in guest drm: Add orientation quirk for GPD Win Max ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111 drm/amd/display: Add signal type check when verify stream backends same drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj drm/amd/display: Fix memory leak drm/amd/display: Use PSR version selected during set_psr_caps usb: gadget: tegra-xudc: Do not program SPARAM usb: gadget: tegra-xudc: Fix control endpoint's definitions usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value ptp: replace snprintf with sysfs_emit drm/amdkfd: Don't take process mutex for svm ioctls powerpc: dts: t104xrdb: fix phy type for FMAN 4/5 ath11k: fix kernel panic during unload/load ath11k modules ath11k: pci: fix crash on suspend if board file is not found ath11k: mhi: use mhi_sync_power_up() net/smc: Send directly when TCP_CORK is cleared drm/bridge: Add missing pm_runtime_put_sync bpf: Make dst_port field in struct bpf_sock 16-bit wide scsi: mvsas: Replace snprintf() with sysfs_emit() scsi: bfa: Replace snprintf() with sysfs_emit() drm/v3d: fix missing unlock power: supply: axp20x_battery: properly report current when discharging mt76: mt7921: fix crash when startup fails. mt76: dma: initialize skip_unmap in mt76_dma_rx_fill cfg80211: don't add non transmitted BSS to 6GHz scanned channels libbpf: Fix build issue with llvm-readelf ipv6: make mc_forwarding atomic net: initialize init_net earlier powerpc: Set crashkernel offset to mid of RMA region drm/amdgpu: Fix recursive locking warning scsi: smartpqi: Fix kdump issue when controller is locked up PCI: aardvark: Fix support for MSI interrupts iommu/arm-smmu-v3: fix event handling soft lockup usb: ehci: add pci device support for Aspeed platforms PCI: endpoint: Fix alignment fault error in copy tests tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH. PCI: pciehp: Add Qualcomm quirk for Command Completed erratum scsi: mpi3mr: Fix reporting of actual data transfer size scsi: mpi3mr: Fix memory leaks powerpc/set_memory: Avoid spinlock recursion in change_page_attr() power: supply: axp288-charger: Set Vhold to 4.4V net/mlx5e: Disable TX queues before registering the netdev usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks() iwlwifi: mvm: Correctly set fragmented EBS iwlwifi: mvm: move only to an enabled channel drm/msm/dsi: Remove spurious IRQF_ONESHOT flag ipv4: Invalidate neighbour for broadcast address upon address addition dm ioctl: prevent potential spectre v1 gadget dm: requeue IO if mapping table not yet available drm/amdkfd: make CRAT table missing message informational only vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA scsi: pm8001: Fix pm80xx_pci_mem_copy() interface scsi: pm8001: Fix pm8001_mpi_task_abort_resp() scsi: pm8001: Fix task leak in pm8001_send_abort_all() scsi: pm8001: Fix tag leaks on error scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req() mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU powerpc/64s/hash: Make hash faults work in NMI context mt76: mt7615: Fix assigning negative values to unsigned variable scsi: aha152x: Fix aha152x_setup() __setup handler return value scsi: hisi_sas: Free irq vectors in order for v3 HW scsi: hisi_sas: Limit users changing debugfs BIST count value net/smc: correct settings of RMB window update limit mips: ralink: fix a refcount leak in ill_acc_of_setup() macvtap: advertise link netns via netlink tuntap: add sanity checks about msg_controllen in sendmsg Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg} Bluetooth: use memset avoid memory leaks bnxt_en: Eliminate unintended link toggle during FW reset PCI: endpoint: Fix misused goto label MIPS: fix fortify panic when copying asm exception handlers powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E powerpc/secvar: fix refcount leak in format_show() scsi: libfc: Fix use after free in fc_exch_abts_resp() can: isotp: set default value for N_As to 50 micro seconds can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len() riscv: Fixed misaligned memory access. Fixed pointer comparison. net: account alternate interface name memory net: limit altnames to 64k total net/mlx5e: Remove overzealous validations in netlink EEPROM query net: sfp: add 2500base-X quirk for Lantech SFP module usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm mt76: fix monitor mode crash with sdio driver xtensa: fix DTC warning unit_address_format MIPS: ingenic: correct unit node address Bluetooth: Fix use after free in hci_send_acl netfilter: conntrack: revisit gc autotuning netlabel: fix out-of-bounds memory accesses ceph: fix inode reference leakage in ceph_get_snapdir() ceph: fix memory leak in ceph_readdir when note_last_dentry returns error lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option init/main.c: return 1 from handled __setup() functions minix: fix bug when opening a file with O_DIRECT clk: si5341: fix reported clk_rate when output divider is 2 staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances staging: vchiq_core: handle NULL result of find_service_by_handle phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use phy: amlogic: meson8b-usb2: Use dev_err_probe() phy: amlogic: meson8b-usb2: fix shared reset control use clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568 cpufreq: CPPC: Fix performance/frequency conversion opp: Expose of-node's name in debugfs staging: wfx: fix an error handling in wfx_init_common() w1: w1_therm: fixes w1_seq for ds28ea00 sensors NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify() NFSv4: Protect the state recovery thread against direct reclaim habanalabs: fix possible memory leak in MMU DR fini xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 clk: ti: Preserve node in ti_dt_clocks_register() clk: Enforce that disjoints limits are invalid SUNRPC/call_alloc: async tasks mustn't block waiting for memory SUNRPC/xprt: async tasks mustn't block waiting for memory SUNRPC: remove scheduling boost for "SWAPPER" tasks. NFS: swap IO handling is slightly different for O_DIRECT IO NFS: swap-out must always use STABLE writes. x86: Annotate call_on_stack() x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() virtio_console: eliminate anonymous module_init & module_exit jfs: prevent NULL deref in diFree SUNRPC: Fix socket waits for write buffer space NFS: nfsiod should not block forever in mempool_alloc() NFS: Avoid writeback threads getting stuck in mempool_alloc() selftests: net: Add tls config dependency for tls selftests parisc: Fix CPU affinity for Lasi, WAX and Dino chips parisc: Fix patch code locking and flushing mm: fix race between MADV_FREE reclaim and blkdev direct IO read rtc: mc146818-lib: change return values of mc146818_get_time() rtc: Check return value from mc146818_get_time() rtc: mc146818-lib: fix RTC presence check drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() Drivers: hv: vmbus: Fix potential crash on module unload Revert "NFSv4: Handle the special Linux file open access mode" NFSv4: fix open failure with O_ACCMODE flag scsi: sr: Fix typo in CDROM(CLOSETRAY\|EJECT) handling scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map() scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() vdpa/mlx5: Rename control VQ workqueue to vdpa wq vdpa/mlx5: Propagate link status from device to vdpa driver vdpa: mlx5: prevent cvq work from hogging CPU net: sfc: add missing xdp queue reinitialization net/tls: fix slab-out-of-bounds bug in decrypt_internal vrf: fix packet sniffing for traffic originating from ip tunnels skbuff: fix coalescing for page_pool fragment recycling ice: Clear default forwarding VSI during VSI release mctp: Fix check for dev_hard_header() result net: ipv4: fix route with nexthop object delete warning net: stmmac: Fix unset max_speed difference between DT and non-DT platforms drm/imx: imx-ldb: Check for null pointer after calling kmemdup drm/imx: Fix memory leak in imx_pd_connector_get_modes drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe regulator: rtq2134: Fix missing active_discharge_on setting regulator: atc260x: Fix missing active_discharge_on setting arch/arm64: Fix topology initialization for core scheduling bnxt_en: Synchronize tx when xdp redirects happen on same ring bnxt_en: reserve space inside receive page for skb_shared_info bnxt_en: Prevent XDP redirect from running when stopping TX queue sfc: Do not free an empty page_ring RDMA/mlx5: Don't remove cache MRs when a delay is needed RDMA/mlx5: Add a missing update of cache->last_add IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition sctp: count singleton chunks in assoc user stats dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe ice: Set txq_teid to ICE_INVAL_TEID on ring creation ice: Do not skip not enabled queues in ice_vc_dis_qs_msg ipv6: Fix stats accounting in ip6_pkt_drop ice: synchronize_rcu() when terminating rings ice: xsk: fix VSI state check in ice_xsk_wakeup() net: openvswitch: don't send internal clone attribute to the userspace. net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address() net: openvswitch: fix leak of nested actions rxrpc: fix a race in rxrpc_exit_net() net: sfc: fix using uninitialized xdp tx_queue net: phy: mscc-miim: reject clause 45 register accesses qede: confirm skb is allocated before using spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() bpf: Support dual-stack sockets in bpf_tcp_check_syncookie drbd: Fix five use after free bugs in get_initial_state scsi: ufs: ufshpb: Fix a NULL check on list iterator io_uring: nospec index for tags on files update io_uring: don't touch scm_fp_list after queueing skb SUNRPC: Handle ENOMEM in call_transmit_status() SUNRPC: Handle low memory situations in call_status() SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec() iommu/omap: Fix regression in probe for NULL pointer dereference perf: arm-spe: Fix perf report --mem-mode perf tools: Fix perf's libperf_print callback perf session: Remap buf if there is no space for event arm64: Add part number for Arm Cortex-A78AE scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove() scsi: ufs: ufs-pci: Add support for Intel MTL Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning" mmc: block: Check for errors after write on SPI mmc: mmci: stm32: correctly check all elements of sg list mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete mmc: core: Fixup support for writeback-cache for eMMC and SD lz4: fix LZ4_decompress_safe_partial read out of bound highmem: fix checks in __kmap_local_sched_{in,out} mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) mm/mempolicy: fix mpol_new leak in shared_policy_replace io_uring: don't check req->file in io_fsync_prep() io_uring: defer splice/tee file validity check until command issue io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF io_uring: fix race between timeout flush and removal x86/pm: Save the MSR validity status at context setup x86/speculation: Restore speculation related MSRs during S3 resume perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids btrfs: fix qgroup reserve overflow the qgroup limit btrfs: prevent subvol with swapfile from being deleted spi: core: add dma_map_dev for __spi_unmap_msg() arm64: patch_text: Fixup last cpu should be master RDMA/hfi1: Fix use-after-free bug for mm struct gpio: Restrict usage of GPIO chip irq members before initialization x86/msi: Fix msi message data shadow struct x86/mm/tlb: Revert retpoline avoidance approach perf/x86/intel: Don't extend the pseudo-encoding to GP counters ata: sata_dwc_460ex: Fix crash due to OOB write perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator perf/core: Inherit event_caps irqchip/gic-v3: Fix GICR_CTLR.RWP polling fbdev: Fix unregistering of framebuffers without device amd/display: set backlight only if required SUNRPC: Prevent immediate close+reconnect drm/panel: ili9341: fix optional regulator handling drm/amdgpu/display: change pipe policy for DCN 2.1 drm/amdgpu/smu10: fix SoC/fclk units in auto mode drm/amdgpu/vcn: Fix the register setting for vcn1 drm/nouveau/pmu: Add missing callbacks for Tegra devices drm/amdkfd: Create file descriptor after client is added to smi_clients list drm/amdgpu: don't use BACO for reset in S3 KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 net/smc: send directly on setting TCP_NODELAY Revert "selftests: net: Add tls config dependency for tls selftests" bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port rtc: mc146818-lib: fix signedness bug in mc146818_get_time() SUNRPC: Don't call connect() more than once on a TCP socket Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()" perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13 perf python: Fix probing for some clang command line options tools build: Filter out options and warnings not supported by clang tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error" KVM: avoid NULL pointer dereference in kvm_dirty_ring_push Revert "net/mlx5: Accept devlink user input after driver initialization complete" ubsan: remove CONFIG_UBSAN_OBJECT_SIZE selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Test open-time cgroup namespace usage for migration checks mm: don't skip swap entry even if zap_details specified Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb() x86/bug: Prevent shadowing in __WARN_FLAGS sched: Teach the forced-newidle balancer about CPU affinity limitation. x86,static_call: Fix __static_call_return0 for i386 irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S irqchip/gic, gic-v3: Prevent GSI to SGI translations mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning static_call: Don't make __static_call_return0 static powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit stacktrace: move filter_irq_stacks() to kernel/stacktrace.c Linux 5.15.34 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I98049d0d8ebd427296418d31085bfde482ad30e7	2022-04-24 16:57:32 +02:00
keystone-kernel-automerger	3bb5834cb6	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: (946 commits) ANDROID: Suppress build.sh deprecation warnings. ANDROID: GKI: 4/20/2022 KMI update ANDROID: update is_cpu_allowed hook prototype UPSTREAM: scsi: ufs: mediatek: Support vops pre suspend to disable auto-hibern8 ANDROID: USB: Add vendor specified variables to phy.h ANDROID: GKI: build multi-gen LRU FROMLIST: mm: multi-gen LRU: design doc FROMLIST: mm: multi-gen LRU: admin guide FROMLIST: mm: multi-gen LRU: debugfs interface FROMLIST: mm: multi-gen LRU: thrashing prevention FROMLIST: mm: multi-gen LRU: kill switch FROMLIST: mm: multi-gen LRU: optimize multiple memcgs FROMLIST: mm: multi-gen LRU: support page table walks FROMLIST: mm: multi-gen LRU: exploit locality in rmap FROMLIST: mm: multi-gen LRU: minimal implementation FROMLIST: mm: multi-gen LRU: groundwork FROMLIST: Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" FROMLIST: mm/vmscan.c: refactor shrink_node() FROMLIST: mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG FROMLIST: mm: x86, arm64: add arch_has_hw_pte_young() ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I1a76e8a01e786fad9f730637de92a88f2f5afb95	2022-04-21 06:17:17 +00:00
Stephen Dickey	457758d86c	ANDROID: update is_cpu_allowed hook prototype Vendor module needs access to the task structure through is_cpu_allowed tracehook to better assess whether the chosen cpu can be allowed. Update the existing is_cpu_allowed tracehook to pass the task struct of the task currently being handled. Bug: 228392842 Change-Id: I882b593ccb77da0755e076c7e636db304ee74b42 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-04-20 17:38:57 +00:00
Yu Zhao	76f7f07cbf	FROMLIST: mm: multi-gen LRU: kill switch Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that can be disabled include: 0x0001: the multi-gen LRU core 0x0002: walking page table, when arch_has_hw_pte_young() returns true 0x0004: clearing the accessed bit in non-leaf PMD entries, when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y [yYnN]: apply to all the components above E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0007 echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 NB: the page table walks happen on the scale of seconds under heavy memory pressure, in which case the mmap_lock contention is a lesser concern, compared with the LRU lock contention and the I/O congestion. So far the only well-known case of the mmap_lock contention happens on Android, due to Scudo [1] which allocates several thousand VMAs for merely a few hundred MBs. The SPF and the Maple Tree also have provided their own assessments [2][3]. However, if walking page tables does worsen the mmap_lock contention, the kill switch can be used to disable it. In this case the multi-gen LRU will suffer a minor performance degradation, as shown previously. Clearing the accessed bit in non-leaf PMD entries can also be disabled, since this behavior was not tested on x86 varieties other than Intel and AMD. [1] https://source.android.com/devices/tech/debug/scudo [2] https://lore.kernel.org/lkml/20220128131006.67712-1-michel@lespinasse.org/ [3] https://lore.kernel.org/lkml/20220202024137.2516438-1-Liam.Howlett@oracle.com/ Link: https://lore.kernel.org/lkml/20220309021230.721028-11-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I71801d9470a2588cad8bfd14fbcfafc7b010aa03	2022-04-20 17:38:56 +00:00
Yu Zhao	5280d76d38	FROMLIST: mm: multi-gen LRU: support page table walks To further exploit spatial locality, the aging prefers to walk page tables to search for young PTEs and promote hot pages. A kill switch will be added in the next patch to disable this behavior. When disabled, the aging relies on the rmap only. NB: this behavior has nothing similar with the page table scanning in the 2.4 kernel [1], which searches page tables for old PTEs, adds cold pages to swapcache and unmaps them. To avoid confusion, the term "iteration" specifically means the traversal of an entire mm_struct list; the term "walk" will be applied to page tables and the rmap, as usual. An mm_struct list is maintained for each memcg, and an mm_struct follows its owner task to the new memcg when this task is migrated. Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls walk_page_range() with each mm_struct on this list to promote hot pages before it increments max_seq. When multiple page table walkers iterate the same list, each of them gets a unique mm_struct; therefore they can run concurrently. Page table walkers ignore any misplaced pages, e.g., if an mm_struct was migrated, pages it left in the previous memcg will not be promoted when its current memcg is under reclaim. Similarly, page table walkers will not promote pages from nodes other than the one under reclaim. This patch uses the following optimizations when walking page tables: 1. It tracks the usage of mm_struct's between context switches so that page table walkers can skip processes that have been sleeping since the last iteration. 2. It uses generational Bloom filters to record populated branches so that page table walkers can reduce their search space based on the query results, e.g., to skip page tables containing mostly holes or misplaced pages. 3. It takes advantage of the accessed bit in non-leaf PMD entries when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y. 4. It does not zigzag between a PGD table and the same PMD table spanning multiple VMAs. IOW, it finishes all the VMAs within the range of the same PMD table before it returns to a PGD table. This improves the cache performance for workloads that have large numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5. Server benchmark results: Single workload: fio (buffered I/O): no change Single workload: memcached (anon): +[5.5, 7.5]% Ops/sec KB/sec patch1-7: 1014393.57 39455.42 patch1-8: 1078507.59 41949.15 Configurations: no change Client benchmark results: kswapd profiles: patch1-7 45.54% lzo1x_1_do_compress (real work) 9.56% page_vma_mapped_walk 6.70% _raw_spin_unlock_irq 2.78% ptep_clear_flush 2.47% do_raw_spin_lock 2.22% __zram_bvec_write 1.87% lru_gen_look_around 1.78% memmove 1.77% obj_malloc 1.44% free_unref_page_list patch1-8 47.02% lzo1x_1_do_compress (real work) 6.73% page_vma_mapped_walk 6.14% _raw_spin_unlock_irq 3.39% walk_pte_range 2.63% ptep_clear_flush 2.29% __zram_bvec_write 2.10% do_raw_spin_lock 1.81% memmove 1.73% obj_malloc 1.53% free_unref_page_list Configurations: no change [1] https://lwn.net/Articles/23732/ [2] https://source.android.com/devices/tech/debug/scudo Link: https://lore.kernel.org/lkml/20220309021230.721028-9-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I5a3c97cf8ebf8d65d5f9528cd979a637c190053e	2022-04-20 17:38:55 +00:00
Yu Zhao	a1537a68c5	FROMLIST: mm: multi-gen LRU: minimal implementation To avoid confusion, the terms "promotion" and "demotion" will be applied to the multi-gen LRU, as a new convention; the terms "activation" and "deactivation" will be applied to the active/inactive LRU, as usual. The aging produces young generations. Given an lruvec, it increments max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging promotes hot pages to the youngest generation when it finds them accessed through page tables; the demotion of cold pages happens consequently when it increments max_seq. The aging has the complexity O(nr_hot_pages), since it is only interested in hot pages. Promotion in the aging path does not require any LRU list operations, only the updates of the gen counter and lrugen->nr_pages[]; demotion, unless as the result of the increment of max_seq, requires LRU list operations, e.g., lru_deactivate_fn(). The eviction consumes old generations. Given an lruvec, it increments min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A feedback loop modeled after the PID controller monitors refaults over anon and file types and decides which type to evict when both types are available from the same generation. Each generation is divided into multiple tiers. Tiers represent different ranges of numbers of accesses through file descriptors. A page accessed N times through file descriptors is in tier order_base_2(N). Tiers do not have dedicated lrugen->lists[], only bits in page->flags. In contrast to moving across generations, which requires the LRU lock, moving across tiers only involves operations on page->flags. The feedback loop also monitors refaults over all tiers and decides when to protect pages in which tiers (N>1), using the first tier (N=0,1) as a baseline. The first tier contains single-use unmapped clean pages, which are most likely the best choices. The eviction moves a page to the next generation, i.e., min_seq+1, if the feedback loop decides so. This approach has the following advantages: 1. It removes the cost of activation in the buffered access path by inferring whether pages accessed multiple times through file descriptors are statistically hot and thus worth protecting in the eviction path. 2. It takes pages accessed through page tables into account and avoids overprotecting pages accessed multiple times through file descriptors. (Pages accessed through page tables are in the first tier, since N=0.) 3. More tiers provide better protection for pages accessed more than twice through file descriptors, when under heavy buffered I/O workloads. Server benchmark results: Single workload: fio (buffered I/O): +[38, 40]% IOPS BW 5.18-ed4643521e6a: 2547k 9989MiB/s patch1-6: 3540k 13.5GiB/s Single workload: memcached (anon): +[103, 107]% Ops/sec KB/sec 5.18-ed4643521e6a: 469048.66 18243.91 patch1-6: 964656.80 37520.88 Configurations: CPU: two Xeon 6154 Mem: total 256G Node 1 was only used as a ram disk to reduce the variance in the results. patch drivers/block/brd.c <<EOF 99,100c99,100 < gfp_flags = GFP_NOIO \| __GFP_ZERO \| __GFP_HIGHMEM; < page = alloc_page(gfp_flags); --- > gfp_flags = GFP_NOIO \| __GFP_ZERO \| __GFP_HIGHMEM \| __GFP_THISNODE; > page = alloc_pages_node(1, gfp_flags, 0); EOF cat >>/etc/systemd/system.conf <<EOF CPUAffinity=numa NUMAPolicy=bind NUMAMask=0 EOF cat >>/etc/memcached.conf <<EOF -m 184320 -s /var/run/memcached/memcached.sock -a 0766 -t 36 -B binary EOF cat fio.sh modprobe brd rd_nr=1 rd_size=113246208 swapoff -a mkfs.ext4 /dev/ram0 mount -t ext4 /dev/ram0 /mnt mkdir /sys/fs/cgroup/user.slice/test echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting cat memcached.sh modprobe brd rd_nr=1 rd_size=113246208 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed Client benchmark results: kswapd profiles: 5.18-ed4643521e6a 39.56% page_vma_mapped_walk 19.32% lzo1x_1_do_compress (real work) 7.18% do_raw_spin_lock 4.23% _raw_spin_unlock_irq 2.26% vma_interval_tree_subtree_search 2.12% vma_interval_tree_iter_next 2.11% folio_referenced_one 1.90% anon_vma_interval_tree_iter_first 1.47% ptep_clear_flush 0.97% __anon_vma_interval_tree_subtree_search patch1-6 36.13% lzo1x_1_do_compress (real work) 19.16% page_vma_mapped_walk 6.55% _raw_spin_unlock_irq 4.02% do_raw_spin_lock 2.32% anon_vma_interval_tree_iter_first 2.11% ptep_clear_flush 1.76% __zram_bvec_write 1.64% folio_referenced_one 1.40% memmove 1.35% obj_malloc Configurations: CPU: single Snapdragon 7c Mem: total 4G Chrome OS MemoryPressure [1] [1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/ Link: https://lore.kernel.org/lkml/20220309021230.721028-7-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I3fe4850006d7984cd9f4fd46134b826609dc2f86	2022-04-20 17:38:55 +00:00
Yu Zhao	f88ed5a3d3	FROMLIST: mm: multi-gen LRU: groundwork Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers. The variable size of the sliding window is designed for such use cases [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking the rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lore.kernel.org/lkml/20220309021230.721028-6-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512	2022-04-20 17:38:55 +00:00
Steven Price	d712aea3cd	cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state commit b7ba6d8dc3569e49800ef0136799f26f43e237e8 upstream. Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in cpuhp_create() is too late as it is used earlier in _cpu_up(). If kzalloc_node() in __smpboot_create_thread() fails then the rollback will be done with st->cpu==0 causing CPU0 to be erroneously set to be dying, causing the scheduler to get mightily confused and throw its toys out of the pram. However the cpu number is actually available directly, so simply remove the 'cpu' member and avoid the problem in the first place. Fixes: `2ea46c6fc9` ("cpumask/hotplug: Fix cpu_dying() state tracking") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220411152233.474129-2-steven.price@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:34:21 +02:00
Chao Gao	4ef9951d02	dma-direct: avoid redundant memory sync for swiotlb commit 9e02977bfad006af328add9434c8bffa40e053bb upstream. When we looked into FIO performance with swiotlb enabled in VM, we found swiotlb_bounce() is always called one more time than expected for each DMA read request. It turns out that the bounce buffer is copied to original DMA buffer twice after the completion of a DMA request (one is done by in dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()). But the content in bounce buffer actually doesn't change between the two rounds of copy. So, one round of copy is redundant. Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to skip the memory copy in it. This fix increases FIO 64KB sequential read throughput in a guest with swiotlb=force by 5.6%. Fixes: `55897af630` ("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Reported-by: Wang Zhaoyang1 <zhaoyang1.wang@intel.com> Reported-by: Gao Liang <liang.gao@intel.com> Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:34:21 +02:00
Anna-Maria Behnsen	111becd63e	timers: Fix warning condition in __run_timers() commit c54bc0fc84214b203f7a0ebfd1bd308ce2abe920 upstream. When the timer base is empty, base::next_expiry is set to base::clk + NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer is queued until jiffies reaches base::next_expiry value, the warning for not finding any expired timer and base::next_expiry_recalc is false in __run_timers() triggers. To prevent triggering the warning in this valid scenario base::timers_pending needs to be added to the warning condition. Fixes: `31cd0e119d` ("timers: Recalculate next timer interrupt only when necessary") Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20220405191732.7438-3-anna-maria@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:34:21 +02:00
Nadav Amit	44981e4cde	smp: Fix offline cpu check in flush_smp_call_function_queue() commit 9e949a3886356fe9112c6f6f34a6e23d1d35407f upstream. The check in flush_smp_call_function_queue() for callbacks that are sent to offline CPUs currently checks whether the queue is empty. However, flush_smp_call_function_queue() has just deleted all the callbacks from the queue and moved all the entries into a local list. This checks would only be positive if some callbacks were added in the short time after llist_del_all() was called. This does not seem to be the intention of this check. Change the check to look at the local list to which the entries were moved instead of the queue from which all the callbacks were just removed. Fixes: `8d056c48e4` ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:34:21 +02:00
Paul Gortmaker	68a38b07f1	tick/nohz: Use WARN_ON_ONCE() to prevent console saturation commit 40e97e42961f8c6cc7bd5fe67cc18417e02d78f1 upstream. While running some testing on code that happened to allow the variable tick_nohz_full_running to get set but with no "possible" NOHZ cores to back up that setting, this warning triggered: if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE)) WARN_ON(tick_nohz_full_running); The console was overwhemled with an endless stream of one WARN per tick per core and there was no way to even see what was going on w/o using a serial console to capture it and then trace it back to this. Change it to WARN_ON_ONCE(). Fixes: `08ae95f4fd` ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211206145950.10927-3-paul.gortmaker@windriver.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-04-20 09:34:20 +02:00

... 10 11 12 13 14 ...

38873 Commits