94de3b405c8dee0ffc8de5c06b32fbf00fc4e8f9
38873 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8e1716993b |
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
commit 2685027fca387b602ae565bff17895188b803988 upstream. There are 3 places where the cpu and node masks of the top cpuset can be initialized in the order they are executed: 1) start_kernel -> cpuset_init() 2) start_kernel -> cgroup_init() -> cpuset_bind() 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp() The first cpuset_init() call just sets all the bits in the masks. The second cpuset_bind() call sets cpus_allowed and mems_allowed to the default v2 values. The third cpuset_init_smp() call sets them back to v1 values. For systems with cgroup v2 setup, cpuset_bind() is called once. As a result, cpu and memory node hot add may fail to update the cpu and node masks of the top cpuset to include the newly added cpu or node in a cgroup v2 environment. For systems with cgroup v1 setup, cpuset_bind() is called again by rebind_subsystem() when the v1 cpuset filesystem is mounted as shown in the dmesg log below with an instrumented kernel. [ 2.609781] cpuset_bind() called - v2 = 1 [ 3.079473] cpuset_init_smp() called [ 7.103710] cpuset_bind() called - v2 = 0 smp_init() is called after the first two init functions. So we don't have a complete list of active cpus and memory nodes until later in cpuset_init_smp() which is the right time to set up effective_cpus and effective_mems. To fix this cgroup v2 mask setup problem, the potentially incorrect cpus_allowed & mems_allowed setting in cpuset_init_smp() are removed. For cgroup v2 systems, the initial cpuset_bind() call will set the masks correctly. For cgroup v1 systems, the second call to cpuset_bind() will do the right setup. cc: stable@vger.kernel.org Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2bde857bee |
Merge 5.15.39 into android13-5.15
Changes in 5.15.39 MIPS: Fix CP0 counter erratum detection for R4k CPUs parisc: Merge model and model name into one line in /proc/cpuinfo ALSA: hda/realtek: Add quirk for Yoga Duet 7 13ITL6 speakers ALSA: fireworks: fix wrong return count shorter than expected by 4 bytes mmc: sdhci-msm: Reset GCC_SDCC_BCR register for SDHC mmc: sunxi-mmc: Fix DMA descriptors allocated above 32 bits mmc: core: Set HS clock speed before sending HS CMD13 gpiolib: of: fix bounds check for 'gpio-reserved-ranges' x86/fpu: Prevent FPU state corruption KVM: x86/svm: Account for family 17h event renumberings in amd_pmc_perf_hw_id iommu/vt-d: Calculate mask for non-aligned flushes iommu/arm-smmu-v3: Fix size calculation in arm_smmu_mm_invalidate_range() drm/amd/display: Avoid reading audio pattern past AUDIO_CHANNELS_COUNT drm/amdgpu: do not use passthrough mode in Xen dom0 RISC-V: relocate DTB if it's outside memory region Revert "SUNRPC: attempt AF_LOCAL connect on setup" timekeeping: Mark NMI safe time accessors as notrace firewire: fix potential uaf in outbound_phy_packet_callback() firewire: remove check of list iterator against head past the loop body firewire: core: extend card->lock in fw_core_handle_bus_reset net: stmmac: disable Split Header (SPH) for Intel platforms genirq: Synchronize interrupt thread startup ASoC: da7219: Fix change notifications for tone generator frequency ASoC: wm8958: Fix change notifications for DSP controls ASoC: meson: Fix event generation for AUI ACODEC mux ASoC: meson: Fix event generation for G12A tohdmi mux ASoC: meson: Fix event generation for AUI CODEC mux s390/dasd: fix data corruption for ESE devices s390/dasd: prevent double format of tracks for ESE devices s390/dasd: Fix read for ESE with blksize < 4k s390/dasd: Fix read inconsistency for ESE DASD devices can: grcan: grcan_close(): fix deadlock can: isotp: remove re-binding of bound socket can: grcan: use ofdev->dev when allocating DMA memory can: grcan: grcan_probe(): fix broken system id check for errata workaround needs can: grcan: only use the NAPI poll budget for RX nfc: replace improper check device_is_registered() in netlink related functions nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs NFC: netlink: fix sleep in atomic bug when firmware download timeout gpio: visconti: Fix fwnode of GPIO IRQ gpio: pca953x: fix irq_stat not updated when irq is disabled (irq_mask not set) hwmon: (adt7470) Fix warning on module removal hwmon: (pmbus) disable PEC if not enabled ASoC: dmaengine: Restore NULL prepare_slave_config() callback ASoC: soc-ops: fix error handling iommu/vt-d: Drop stop marker messages iommu/dart: check return value after calling platform_get_resource() net/mlx5e: Fix trust state reset in reload net/mlx5e: Don't match double-vlan packets if cvlan is not set net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release net/mlx5e: Fix the calling of update_buffer_lossy() API net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow selftests/seccomp: Don't call read() on TTY from background pgrp SUNRPC release the transport of a relocated task with an assigned transport RDMA/siw: Fix a condition race issue in MPA request processing RDMA/irdma: Flush iWARP QP if modified to ERR from RTR state RDMA/irdma: Reduce iWARP QP destroy time RDMA/irdma: Fix possible crash due to NULL netdev in notifier NFSv4: Don't invalidate inode attributes on delegation return net: ethernet: mediatek: add missing of_node_put() in mtk_sgmii_init() net: dsa: mt7530: add missing of_node_put() in mt7530_setup() net: stmmac: dwmac-sun8i: add missing of_node_put() in sun8i_dwmac_register_mdio_mux() net: mdio: Fix ENOMEM return value in BCM6368 mux bus controller net: cpsw: add missing of_node_put() in cpsw_probe_dt() net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() net: emaclite: Add error handling for of_address_to_resource() selftests/net: so_txtime: fix parsing of start time stamp on 32 bit systems selftests/net: so_txtime: usage(): fix documentation of default clock drm/msm/dp: remove fail safe mode related code btrfs: do not BUG_ON() on failure to update inode when setting xattr hinic: fix bug of wq out of bound access mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() rxrpc: Enable IPv6 checksums on transport socket selftests: mirror_gre_bridge_1q: Avoid changing PVID while interface is operational bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag bnxt_en: Fix unnecessary dropping of RX packets selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer smsc911x: allow using IRQ0 btrfs: force v2 space cache usage for subpage mount btrfs: always log symlinks in full mode drm/amdgpu: unify BO evicting method in amdgpu_ttm drm/amdgpu: explicitly check for s0ix when evicting resources drm/amdgpu: don't set s3 and s0ix at the same time drm/amdgpu: Ensure HDA function is suspended before ASIC reset gpio: mvebu: drop pwm base assignment kvm: x86/cpuid: Only provide CPUID leaf 0xA if host has architectural PMU fbdev: Make fb_release() return -ENODEV if fbdev was unregistered net/mlx5: Fix slab-out-of-bounds while reading resource dump menu net/mlx5e: Lag, Fix use-after-free in fib event handler net/mlx5e: Lag, Fix fib_info pointer assignment net/mlx5e: Lag, Don't skip fib events on current dst iommu/dart: Add missing module owner to ops structure kvm: selftests: do not use bitfields larger than 32-bits for PTEs KVM: selftests: Silence compiler warning in the kvm_page_table_test x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume KVM: x86: Do not change ICR on write to APIC_SELF_IPI KVM: x86/mmu: avoid NULL-pointer dereference on page freeing bugs KVM: LAPIC: Enable timer posted-interrupt only when mwait/hlt is advertised selftest/vm: verify mmap addr in mremap_test selftest/vm: verify remap destination address in mremap_test mmc: rtsx: add 74 Clocks in power on flow Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized" rcu: Fix callbacks processing time limit retaining cond_resched() rcu: Apply callbacks processing time limit only on softirq PCI: pci-bridge-emul: Add description for class_revision field PCI: pci-bridge-emul: Add definitions for missing capabilities registers PCI: aardvark: Add support for DEVCAP2, DEVCTL2, LNKCAP2 and LNKCTL2 registers on emulated bridge PCI: aardvark: Clear all MSIs at setup PCI: aardvark: Comment actions in driver remove method PCI: aardvark: Disable bus mastering when unbinding driver PCI: aardvark: Mask all interrupts when unbinding driver PCI: aardvark: Fix memory leak in driver unbind PCI: aardvark: Assert PERST# when unbinding driver PCI: aardvark: Disable link training when unbinding driver PCI: aardvark: Disable common PHY when unbinding driver PCI: aardvark: Replace custom PCIE_CORE_INT_* macros with PCI_INTERRUPT_* PCI: aardvark: Rewrite IRQ code to chained IRQ handler PCI: aardvark: Check return value of generic_handle_domain_irq() when processing INTx IRQ PCI: aardvark: Make MSI irq_chip structures static driver structures PCI: aardvark: Make msi_domain_info structure a static driver structure PCI: aardvark: Use dev_fwnode() instead of of_node_to_fwnode(dev->of_node) PCI: aardvark: Refactor unmasking summary MSI interrupt PCI: aardvark: Add support for masking MSI interrupts PCI: aardvark: Fix setting MSI address PCI: aardvark: Enable MSI-X support PCI: aardvark: Add support for ERR interrupt on emulated bridge PCI: aardvark: Optimize writing PCI_EXP_RTCTL_PMEIE and PCI_EXP_RTSTA_PME on emulated bridge PCI: aardvark: Add support for PME interrupts PCI: aardvark: Fix support for PME requester on emulated bridge PCI: aardvark: Use separate INTA interrupt for emulated root bridge PCI: aardvark: Remove irq_mask_ack() callback for INTx interrupts PCI: aardvark: Don't mask irq when mapping PCI: aardvark: Drop __maybe_unused from advk_pcie_disable_phy() PCI: aardvark: Update comment about link going down after link-up Linux 5.15.39 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ic51d6d05a0d99156c6fd844786e984aff8e7386a |
||
|
|
ecda2085fd |
Revert 5.15.37 merge into android13-5.15
This reverts the merge of 5.15.37 into the android13-5.15
There are lots of ABI issues, and many of the commits are not needed in
the Android tree at this time. Revert the merge (except for the
Makefile change), so that future merges will continue to work, and the
needed individual changes from this release will be manually added to
the tree at a later point in time.
Fixes: f7dace75d276 ("Merge 5.15.37 into android13-5.15")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0632858e5c0fb94fc14c0f4216997330eca260a7
|
||
|
|
ef5fed3c1e |
Merge 5.15.37 into android13-5.15
Changes in 5.15.37
floppy: disable FDRAWCMD by default
bpf: Introduce composable reg, ret and arg types.
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
bpf: Introduce MEM_RDONLY flag
bpf: Convert PTR_TO_MEM_OR_NULL to composable types.
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
bpf/selftests: Test PTR_TO_RDONLY_MEM
bpf: Fix crash due to out of bounds access into reg2btf_ids.
spi: cadence-quadspi: fix write completion support
ARM: dts: socfpga: change qspi to "intel,socfpga-qspi"
mm: kfence: fix objcgs vector allocation
gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}
iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable
iov_iter: Introduce fault_in_iov_iter_writeable
gfs2: Add wrapper for iomap_file_buffered_write
gfs2: Clean up function may_grant
gfs2: Introduce flag for glock holder auto-demotion
gfs2: Move the inode glock locking to gfs2_file_buffered_write
gfs2: Eliminate ip->i_gh
gfs2: Fix mmap + page fault deadlocks for buffered I/O
iomap: Fix iomap_dio_rw return value for user copies
iomap: Support partial direct I/O on user copy failures
iomap: Add done_before argument to iomap_dio_rw
gup: Introduce FOLL_NOFAULT flag to disable page faults
iov_iter: Introduce nofault flag to disable page faults
gfs2: Fix mmap + page fault deadlocks for direct I/O
btrfs: fix deadlock due to page faults during direct IO reads and writes
btrfs: fallback to blocking mode when doing async dio over multiple extents
mm: gup: make fault_in_safe_writeable() use fixup_user_fault()
selftests/bpf: Add test for reg2btf_ids out of bounds access
Linux 5.15.37
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ica39b8856d6e3928a82f4e34f8b401f1a5cba5ee
|
||
|
|
e95cdba8e2 |
Merge 5.15.36 into android13-5.15
Changes in 5.15.36 fs: remove __sync_filesystem block: remove __sync_blockdev block: simplify the block device syncing code vfs: make sync_filesystem return errors from ->sync_fs xfs: return errors in xfs_fs_sync_fs dma-mapping: remove bogus test for pfn_valid from dma_map_resource arm64/mm: drop HAVE_ARCH_PFN_VALID etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead mm: page_alloc: fix building error on -Werror=array-compare perf tools: Fix segfault accessing sample_id xyarray mm, kfence: support kmem_dump_obj() for KFENCE objects gfs2: assign rgrp glock before compute_bitstructs scsi: ufs: core: scsi_get_lba() error fix net/sched: cls_u32: fix netns refcount changes in u32_change() ALSA: usb-audio: Clear MIDI port active flag after draining ALSA: hda/realtek: Add quirk for Clevo NP70PNP ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek ASoC: topology: Correct error handling in soc_tplg_dapm_widget_create() ASoC: rk817: Use devm_clk_get() in rk817_platform_probe ASoC: msm8916-wcd-digital: Check failure for devm_snd_soc_register_component ASoC: codecs: wcd934x: do not switch off SIDO Buck when codec is in use dmaengine: idxd: fix device cleanup on disable dmaengine: imx-sdma: Fix error checking in sdma_event_remap dmaengine: mediatek:Fix PM usage reference leak of mtk_uart_apdma_alloc_chan_resources dmaengine: dw-edma: Fix unaligned 64bit access spi: spi-mtk-nor: initialize spi controller after resume esp: limit skb_page_frag_refill use to a single page spi: cadence-quadspi: fix incorrect supports_op() return value igc: Fix infinite loop in release_swfw_sync igc: Fix BUG: scheduling while atomic igc: Fix suspending when PTM is active ALSA: hda/hdmi: fix warning about PCM count when used with SOF rxrpc: Restore removed timer deletion net/smc: Fix sock leak when release after smc_shutdown() net/packet: fix packet_sock xmit return value checking ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() ip6_gre: Fix skb_under_panic in __gre6_xmit() net: restore alpha order to Ethernet devices in config net/sched: cls_u32: fix possible leak in u32_init_knode() l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu ipv6: make ip6_rt_gc_expire an atomic_t can: isotp: stop timeout monitoring when no first frame was sent net: dsa: hellcreek: Calculate checksums in tagger net: mscc: ocelot: fix broken IP multicast flooding netlink: reset network and mac headers in netlink_dump() drm/i915/display/psr: Unset enable_psr2_sel_fetch if other checks in intel_psr2_config_valid() fails net: stmmac: Use readl_poll_timeout_atomic() in atomic state dmaengine: idxd: add RO check for wq max_batch_size write dmaengine: idxd: add RO check for wq max_transfer_size write dmaengine: idxd: skip clearing device context when device is read-only selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets arm64: mm: fix p?d_leaf() ARM: vexpress/spc: Avoid negative array index when !SMP reset: renesas: Check return value of reset_control_deassert() reset: tegra-bpmp: Restore Handle errors in BPMP response platform/x86: samsung-laptop: Fix an unsigned comparison which can never be negative ALSA: usb-audio: Fix undefined behavior due to shift overflowing the constant drm/msm/disp: check the return value of kzalloc() arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes vxlan: fix error return code in vxlan_fdb_append cifs: Check the IOCB_DIRECT flag, not O_DIRECT net: atlantic: Avoid out-of-bounds indexing mt76: Fix undefined behavior due to shift overflowing the constant brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info() drm/msm/mdp5: check the return of kzalloc() net: macb: Restart tx only if queue pointer is lagging scsi: iscsi: Release endpoint ID when its freed scsi: iscsi: Merge suspend fields scsi: iscsi: Fix NOP handling during conn recovery scsi: qedi: Fix failed disconnect handling stat: fix inconsistency between struct stat and struct compat_stat VFS: filename_create(): fix incorrect intent. nvme: add a quirk to disable namespace identifiers nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202 nvme-pci: disable namespace identifiers for Qemu controllers EDAC/synopsys: Read the error count from the correct register mm/memory-failure.c: skip huge_zero_page in memory_failure() memcg: sync flush only if periodic flush is delayed mm, hugetlb: allow for "high" userspace addresses oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() ata: pata_marvell: Check the 'bmdma_addr' beforing reading dma: at_xdmac: fix a missing check on list iterator dmaengine: imx-sdma: fix init of uart scripts net: atlantic: invert deep par in pm functions, preventing null derefs Input: omap4-keypad - fix pm_runtime_get_sync() error checking scsi: sr: Do not leak information in ioctl sched/pelt: Fix attach_entity_load_avg() corner case perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare KVM: PPC: Fix TCE handling for VFIO drm/vc4: Use pm_runtime_resume_and_get to fix pm_runtime_get_sync() usage powerpc/perf: Fix power9 event alternatives powerpc/perf: Fix power10 event alternatives perf script: Always allow field 'data_src' for auxtrace perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event xtensa: patch_text: Fixup last cpu should be master xtensa: fix a7 clobbering in coprocessor context load/store openvswitch: fix OOB access in reserve_sfa_size() gpio: Request interrupts after IRQ is initialized ASoC: soc-dapm: fix two incorrect uses of list iterator e1000e: Fix possible overflow in LTR decoding ARC: entry: fix syscall_trace_exit argument arm_pmu: Validate single/group leader events KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race KVM: nVMX: Defer APICv updates while L2 is active until L1 is active KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs netfilter: conntrack: convert to refcount_t api netfilter: conntrack: avoid useless indirection during conntrack destruction ext4: fix fallocate to use file_modified to update permissions consistently ext4: fix symlink file size not match to file content ext4: fix use-after-free in ext4_search_dir ext4: limit length to bitmap_maxbytes - blocksize in punch_hole ext4, doc: fix incorrect h_reserved size ext4: fix overhead calculation to account for the reserved gdt blocks ext4: force overhead calculation if the s_overhead_cluster makes no sense netfilter: nft_ct: fix use after free when attaching zone template jbd2: fix a potential race while discarding reserved buffers after an abort spi: atmel-quadspi: Fix the buswidth adjustment between spi-mem and controller block/compat_ioctl: fix range check in BLKGETSIZE arm64: dts: qcom: add IPA qcom,qmp property Linux 5.15.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I44d3a4de9b6fa1d2016b4e063eb211e8373a1216 |
||
|
|
fed04ede18 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: FROMLIST: usb: dwc3: Fix ep0 handling when getting reset while doing control transfer ANDROID: ABI: Update symbols to unisoc whitelist for the 7st FROMLIST: media: Kconfig: Make DVB_CORE=m possible when MEDIA_SUPPORT=y ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: firmware_loader: Fix warning with firmware_param_path_set Revert "ANDROID: KVM: arm64: pkvm: Ensure that TLBs and I-cache are private to each vcpu" ANDROID: GCE: To build kernel image for gce cloud android. ANDROID: enable db845c kleaf build. Revert "ANDROID: Make file-backed vma teardown synchronous" ANDROID: GCE: To build kernel image for gce cloud android. FROMLIST: serial: qcom_geni_serial: Disable MMIO tracing for geni serial ANDROID: kernel: fix debug_kinfo set twice crash issue UPSTREAM: media: v4l2-ctrls: Add RGB color effects control ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list ANDROID: usb: export tracepoint for dwc3_complete_trb ANDROID: cputime: seprate irq entry and exit tracehooks Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I1f3900dd336d9fb20a0437ebbb48e849f8133640 |
||
|
|
23ef56f65c |
Revert "ANDROID: Make file-backed vma teardown synchronous"
This reverts commit
|
||
|
|
0060c7bd9e |
rcu: Apply callbacks processing time limit only on softirq
commit a554ba288845fd3f6f12311fd76a51694233458a upstream. Time limit only makes sense when callbacks are serviced in softirq mode because: _ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback. _ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq(). Therefore, make sure the time limit only applies to softirq mode. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
2c5029d652 |
rcu: Fix callbacks processing time limit retaining cond_resched()
commit 3e61e95e2d095e308616cba4ffb640f95a480e01 upstream. The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue. However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long... Make sure the scheduler has a higher priority than the time limit. Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [UR: backport to 5.15-stable + commit update] Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
61808e4089 |
genirq: Synchronize interrupt thread startup
commit 8707898e22fd665bc1d7b18b809be4b56ce25bdd upstream.
A kernel hang can be observed when running setserial in a loop on a kernel
with force threaded interrupts. The sequence of events is:
setserial
open("/dev/ttyXXX")
request_irq()
do_stuff()
-> serial interrupt
-> wake(irq_thread)
desc->threads_active++;
close()
free_irq()
kthread_stop(irq_thread)
synchronize_irq() <- hangs because desc->threads_active != 0
The thread is created in request_irq() and woken up, but does not get on a
CPU to reach the actual thread function, which would handle the pending
wake-up. kthread_stop() sets the should stop condition which makes the
thread immediately exit, which in turn leaves the stale threads_active
count around.
This problem was introduced with commit
|
||
|
|
07adb69545 |
timekeeping: Mark NMI safe time accessors as notrace
commit 2c33d775ef4c25c0e1e1cc0fd5496d02f76bfa20 upstream. Mark the CLOCK_MONOTONIC fast time accessors as notrace. These functions are used in tracing to retrieve timestamps, so they should not recurse. Fixes: |
||
|
|
82b7f3fbf1 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: (22 commits) ANDROID: ABI: Update pixel symbol list and ABI xml ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: Make file-backed vma teardown synchronous ANDROID: abi_gki_aarch64_qcom: Add icc_sync_state ANDROID: ABI: Update symbols to unisoc whitelist for the 6th ANDROID: Update symbol list for mtk ANDROID: abi_gki_aarch64_qcom: Update symbol list. UPSTREAM: printk: ringbuffer: Improve prb_next_seq() performance ANDROID: firmware_loader: Add support for customer firmware paths FROMLIST: remoteproc: Use unbounded workqueue for recovery work ANDROID: kbuild: mod: Move $(obj) prefix inside awk FROMGIT: kbuild: read *.mod to get objects passed to $(LD) or $(AR) FROMGIT: kbuild: make *.mod not depend on *.o FROMGIT: kbuild: get rid of duplication in *.mod files FROMGIT: kbuild: split the second line of *.mod into *.usyms FROMGIT: kbuild: reuse real-search to simplify cmd_mod FROMGIT: kbuild: make multi_depend work with targets in subdirectory FROMGIT: kbuild: reuse suffix-search to refactor multi_depend ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: ABI: Update symbols to unisoc whitelist for the 5th ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I6bc401372db3c8a5bb2714630b98a2ea326aa606 |
||
|
|
aec40de3d7 |
ANDROID: cputime: seprate irq entry and exit tracehooks
Currently the code has single hook for tracking irqs. However modules need to deduce start and end of the irq. Create separate hooks for irq start and end since the cputime has already figured it out. Bug: 231341763 Change-Id: Ie0dd503b283d83f69d01171ebd1cd6127c3bafd0 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com> |
||
|
|
fe25fc5375 |
ANDROID: Make file-backed vma teardown synchronous
When a file-backed vma is being released, the userspace can have an
expectation that the vma and the file it's pinning will be released
synchronously. This does not happen when SPF is enabled because vma
and associated file are released asynchronously after RCU grace
period. This is done to prevent pagefault handler from stepping on
a deleted object. Fix this issue by synchronously waiting for RCU
grace period during file-backed vma tear-down.
Fixes:
|
||
|
|
a327e05923 |
UPSTREAM: printk: ringbuffer: Improve prb_next_seq() performance
prb_next_seq() always iterates from the first known sequence number. In the worst case, it might loop 8k times for 256kB buffer, 15k times for 512kB buffer, and 64k times for 2MB buffer. It was reported that polling and reading using syslog interface might occupy 50% of CPU. Speedup the search by storing @id of the last finalized descriptor. The loop is still needed because the @id is stored and read in the best effort way. An atomic variable is used to keep the @id consistent. But the stores and reads are not serialized against each other. The descriptor could get reused in the meantime. The related sequence number will be used only when it is still valid. An invalid value should be read _only_ when there is a flood of messages and the ringbuffer is rapidly reused. The performance is the least problem in this case. Bug: 216238044 (cherry picked from commit ef244b4dc53e520d4570b2610436aba0593ce6f55) Reported-by: Chunlei Wang <chunlei.wang@mediatek.com> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/1642770388-17327-1-git-send-email-quic_mojha@quicinc.com Link: https://lore.kernel.org/lkml/YXlddJxLh77DKfIO@alley/T/#m43062e8b2a17f8dbc8c6ccdb8851fb0dbaabbb14 Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com> Change-Id: Ie6b1276eca791a891e42d5635ca1f116ae7cadef Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com> |
||
|
|
989c61a63c |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list FROMGIT: net: fix wrong network header length ANDROID: Update mtktv symbol list 3rd ANDROID: arm: Mark the recheduling IPI as raw interrupt ANDROID: arm64: Mark the recheduling IPI as raw interrupt ANDROID: genirq: Allow an interrupt to be marked as 'raw' ANDROID: vendor_hooks: Add hooks for ufs scheduler FROMLIST: soc: qcom: geni: Disable MMIO tracing for GENI SE ANDROID: ABI: Update symbols to unisoc whitelist for the 4st ANDROID: abi_gki_aarch64_qcom: Update qcom abi symbol list ANDROID: abi_gki_aarch64_qcom: Update symbol list ANDROID: abi_gki_aarch64_qcom: Sort symbol list Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I52045ea99a1a29b8bc2a089856e74517b637ed0e |
||
|
|
a879ad2ff0 |
ANDROID: genirq: Allow an interrupt to be marked as 'raw'
Some interrupts (such as the rescheduling IPI) rely upon not going through the irq_enter()/irq_exit() calls. To distinguish such interrupts, add a new IRQ flag that allows the low-level handling code to sidestep the enter()/exit() calls. Only the architecture code is expected to use this. It will do the wrong thing on normal interrupts. Note that this is a band-aid until we can move to some more correct infrastructure (such as kernel/entry/common.c). Bug: 191808738 Link: https://lore.kernel.org/lkml/20201124141449.572446-3-maz@kernel.org/ Change-Id: I0609a8b689219ba9e769c8b9f7fcf1e77a0ff1ca Signed-off-by: Marc Zyngier <maz@kernel.org> [minor port to 5.10] Signed-off-by: Stephen Dickey <dickey@codeaurora.org> [minor port to 5.15] Signed-off-by: Abhijeet Dharmapurikar <quic_adharmap@quicinc.com> |
||
|
|
9284fe85c5 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: (32 commits) ANDROID: GKI: 5/4/2022 KMI update ANDROID: GKI: add mem_section to pixel's symbol list ANDROID: net: introduce ip_local_unbindable_ports sysctl ANDROID: sched: Add flags parameter to enq/deq after tracehooks ANDROID: GKI: Remove pfn_valid symbol BACKPORT: arm64/mm: drop HAVE_ARCH_PFN_VALID BACKPORT: dma-mapping: remove bogus test for pfn_valid from dma_map_resource ANDROID: add kabi padding for structures for the android13 release ANDROID: GKI: device.h: add Android ABI padding to some structures ANDROID: GKI: elevator: add Android ABI padding to some structures ANDROID: GKI: scsi: add Android ABI padding to some structures ANDROID: GKI: workqueue.h: add Android ABI padding to some structures ANDROID: GKI: sched: add Android ABI padding to some structures ANDROID: GKI: phy: add Android ABI padding to some structures ANDROID: GKI: fs.h: add Android ABI padding to some structures ANDROID: GKI: dentry: add Android ABI padding to some structures ANDROID: GKI: bio: add Android ABI padding to some structures ANDROID: GKI: ufs: add Android ABI padding to some structures Revert "Revert "opp: Expose of-node's name in debugfs"" Revert "Revert "gpio: Restrict usage of GPIO chip irq members before initialization"" ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I75a673637b41f5cecaf691f86b2103f2ad3b5110 |
||
|
|
7045a3408d |
ANDROID: sched: Add flags parameter to enq/deq after tracehooks
Currently, the enqueue and dequeue tracehooks pass in the flags parameter, however, the after tracehooks that follow do not. Bug: 226570047 Change-Id: I51cb50054562893271e5d3efd7c6bd028977622d Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
|
565ca36055 |
BACKPORT: dma-mapping: remove bogus test for pfn_valid from dma_map_resource
dma_map_resource() uses pfn_valid() to ensure the range is not RAM. However, pfn_valid() only checks for availability of the memory map for a PFN but it does not ensure that the PFN is actually backed by RAM. As dma_map_resource() is the only method in DMA mapping APIs that has this check, simply drop the pfn_valid() test from dma_map_resource(). Link: https://lore.kernel.org/all/20210824173741.GC623@arm.com/ Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20210930013039.11260-2-rppt@kernel.org Signed-off-by: Will Deacon <will@kernel.org> Bug: 228454859 (cherry picked from commit a9c38c5d267cb94871dfa2de5539c92025c855d7) Change-Id: Iae9d77b018d728a0cfaf5f4fd7e14311aa463659 Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> |
||
|
|
8c39925e98 |
bpf: Fix crash due to out of bounds access into reg2btf_ids.
commit 45ce4b4f9009102cd9f581196d480a59208690c1 upstream
When commit
|
||
|
|
2a77c58726 |
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
commit 216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20 upstream. Some helper functions may modify its arguments, for example, bpf_d_path, bpf_get_stack etc. Previously, their argument types were marked as ARG_PTR_TO_MEM, which is compatible with read-only mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate, but technically incorrect, to modify a read-only memory by passing it into one of such helper functions. This patch tags the bpf_args compatible with immutable memory with MEM_RDONLY flag. The arguments that don't have this flag will be only compatible with mutable memory types, preventing the helper from modifying a read-only memory. The bpf_args that have MEM_RDONLY are compatible with both mutable memory and immutable memory. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
15166bb300 |
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
commit 34d3a78c681e8e7844b43d1a2f4671a04249c821 upstream.
Tag the return type of {per, this}_cpu_ptr with RDONLY_MEM. The
returned value of this pair of helpers is kernel object, which
can not be updated by bpf programs. Previously these two helpers
return PTR_OT_MEM for kernel objects of scalar type, which allows
one to directly modify the memory. Now with RDONLY_MEM tagging,
the verifier will reject programs that write into RDONLY_MEM.
Fixes:
|
||
|
|
b710f73704 |
bpf: Convert PTR_TO_MEM_OR_NULL to composable types.
commit cf9f2f8d62eca810afbd1ee6cc0800202b000e57 upstream. Remove PTR_TO_MEM_OR_NULL and replace it with PTR_TO_MEM combined with flag PTR_MAYBE_NULL. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-7-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
b453361384 |
bpf: Introduce MEM_RDONLY flag
commit 20b2aff4bc15bda809f994761d5719827d66c0b4 upstream. This patch introduce a flag MEM_RDONLY to tag a reg value pointing to read-only memory. It makes the following changes: 1. PTR_TO_RDWR_BUF -> PTR_TO_BUF 2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
8d38cde47a |
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
commit c25b2ae136039ffa820c26138ed4a5e5f3ab3841 upstream. We have introduced a new type to make bpf_reg composable, by allocating bits in the type to represent flags. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. This patch switches the qualified reg_types to use this flag. The reg_types changed in this patch include: 1. PTR_TO_MAP_VALUE_OR_NULL 2. PTR_TO_SOCKET_OR_NULL 3. PTR_TO_SOCK_COMMON_OR_NULL 4. PTR_TO_TCP_SOCK_OR_NULL 5. PTR_TO_BTF_ID_OR_NULL 6. PTR_TO_MEM_OR_NULL 7. PTR_TO_RDONLY_BUF_OR_NULL 8. PTR_TO_RDWR_BUF_OR_NULL [haoluo: backport notes There was a reg_type_may_be_null() in adjust_ptr_min_max_vals() in 5.15.x, but didn't exist in the upstream commit. This backport converted that reg_type_may_be_null() to type_may_be_null() as well.] Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
3c141c82b9 |
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
commit 3c4807322660d4290ac9062c034aed6b87243861 upstream. We have introduced a new type to make bpf_ret composable, by reserving high bits to represent flags. One of the flag is PTR_MAYBE_NULL, which indicates a pointer may be NULL. When applying this flag to ret_types, it means the returned value could be a NULL pointer. This patch switches the qualified arg_types to use this flag. The ret_types changed in this patch include: 1. RET_PTR_TO_MAP_VALUE_OR_NULL 2. RET_PTR_TO_SOCKET_OR_NULL 3. RET_PTR_TO_TCP_SOCK_OR_NULL 4. RET_PTR_TO_SOCK_COMMON_OR_NULL 5. RET_PTR_TO_ALLOC_MEM_OR_NULL 6. RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL 7. RET_PTR_TO_BTF_ID_OR_NULL This patch doesn't eliminate the use of these names, instead it makes them aliases to 'RET_PTR_TO_XXX | PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-4-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
d58a396fa6 |
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
commit 48946bd6a5d695c50b34546864b79c1f910a33c1 upstream. We have introduced a new type to make bpf_arg composable, by reserving high bits of bpf_arg to represent flags of a type. One of the flags is PTR_MAYBE_NULL which indicates a pointer may be NULL. When applying this flag to an arg_type, it means the arg can take NULL pointer. This patch switches the qualified arg_types to use this flag. The arg_types changed in this patch include: 1. ARG_PTR_TO_MAP_VALUE_OR_NULL 2. ARG_PTR_TO_MEM_OR_NULL 3. ARG_PTR_TO_CTX_OR_NULL 4. ARG_PTR_TO_SOCKET_OR_NULL 5. ARG_PTR_TO_ALLOC_MEM_OR_NULL 6. ARG_PTR_TO_STACK_OR_NULL This patch does not eliminate the use of these arg_types, instead it makes them an alias to the 'ARG_XXX | PTR_MAYBE_NULL'. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211217003152.48334-3-haoluo@google.com Cc: stable@vger.kernel.org # 5.15.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
d0510ff5a4 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: ANDROID: sched/rt: Only enable RT sync for SMP targets ANDROID: sched/rt: Add support for rt sync wakeups ANDROID: irq: manage: Export irq_do_set_affinity symbol ANDROID: gic-v3: Update vendor hook to set affinity in GIC v3 FROMGIT: scsi: ufs: core: Exclude UECxx from SFR dump list ANDROID: GKI: add initial symbol list for Exynos Auto SoC ANDROID: arm64: Auto-enroll MMIO guard on protected vms ANDROID: mm: Export vmalloc_nr_pages ANDROID: mm: Export pcpu_nr_pages UPSTREAM: f2fs: should not truncate blocks during roll-forward recovery ANDROID: add gki_module headers to .gitignore file ANDROID: usb: add EXPORT_TRACE_SYMBOL to export tracepoint Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: Ib9828714fa460cc6f6e4519f1f2a2c970302763d |
||
|
|
def3ff8c55 |
ANDROID: sched/rt: Only enable RT sync for SMP targets
The rt sync wakeup support has a condition which relies on a field that
exists only when CONFIG_SMP is defined, causing a compilation issue.
Since sync wakeup has no real meaning on a non-SMP system, we can just
drop the CONFIG_RT_GROUP_SCHED part of the #ifdef.
Fixes: da5f3cd37802 ("ANDROID: sched/rt: Add support for rt sync wakeups")
Signed-off-by: J. Avila <elavila@google.com>
Change-Id: I9b95304408d323b0c1017bd33746ecfbb2b35808
|
||
|
|
a7133dd750 |
ANDROID: sched/rt: Add support for rt sync wakeups
Some rt tasks undergo sync wakeup. Currently, these tasks will be placed on other, often sleeping or otherwise idle CPUs, which can lead to unnecessary power hits. Bug: 157906395 Change-Id: I48864d0847bbe4f7813c842032880ad3f3b8b06b Signed-off-by: J. Avila <elavila@google.com> [quic_dickey@quicinc.com: Port to 5.15] Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
|
9f7014a6d2 |
ANDROID: irq: manage: Export irq_do_set_affinity symbol
Vendor kernel modules may implement irq balancers, which could take irq desc lock of an irq and then based on current affinity mask or affinity hint, reconfigure the affinity of that irq. For example : For an irq, for which affinity is broken i.e. all the cpus in its affinity mask have gone offline. For such irqs, we might want to reset the affinity, when the original set of affined cpus, come back online. desc->affinity_hint can be used for figuring out the original affinity. So, the sequence for doing this becomes: desc = irq_to_desc(i); raw_spin_lock(&desc->lock); affinity = desc->affinity_hint; raw_spin_unlock(&desc->lock); irq_set_affinity_hint(i, affinity); Here, we need to release the desc lock before calling the exported api irq_set_affinity_hint(). This creates a window where, after unlocking desc lock and before calling irq_set_affinity_hint(), where this setting can race with other irq_set_affinity_hint() callers. So, export irq_do_set_affinity() symbol to provide an api, which can be called with desc lock held. Bug: 187157600 Change-Id: Ifad88bfaa1e7eec09c3fe5a9dd7d1d421362b41e Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> (cherry picked from commit d88c1e77fd574f694540c48a67e210aa6cfea01f) |
||
|
|
d7f3794b34 |
ANDROID: add gki_module headers to .gitignore file
In commit |
||
|
|
56637084e8 |
perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
[ Upstream commit 60490e7966659b26d74bf1fa4aa8693d9a94ca88 ]
This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on
both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1].
sysdig -B works fine after rebuilding the kernel with
CONFIG_PERF_USE_VMALLOC disabled.
I tracked it down to the if condition event->rb->nr_pages != nr_pages
in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where
event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to
return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is
enabled, rb->nr_pages is always equal to 1.
Arch with CONFIG_PERF_USE_VMALLOC enabled by default:
arc/arm/csky/mips/sh/sparc/xtensa
Arch with CONFIG_PERF_USE_VMALLOC disabled by default:
x86_64/aarch64/...
Fix this problem by using data_page_nr()
[1] https://github.com/draios/sysdig
Fixes:
|
||
|
|
b1b9294682 |
sched/pelt: Fix attach_entity_load_avg() corner case
[ Upstream commit 40f5aa4c5eaebfeaca4566217cb9c468e28ed682 ]
The warning in cfs_rq_is_decayed() triggered:
SCHED_WARN_ON(cfs_rq->avg.load_avg ||
cfs_rq->avg.util_avg ||
cfs_rq->avg.runnable_avg)
There exists a corner case in attach_entity_load_avg() which will
cause load_sum to be zero while load_avg will not be.
Consider se_weight is 88761 as per the sched_prio_to_weight[] table.
Further assume the get_pelt_divider() is 47742, this gives:
se->avg.load_avg is 1.
However, calculating load_sum:
se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se));
se->avg.load_sum = 1*47742/88761 = 0.
Then enqueue_load_avg() adds this to the cfs_rq totals:
cfs_rq->avg.load_avg += se->avg.load_avg;
cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum;
Resulting in load_avg being 1 with load_sum is 0, which will trigger
the WARN.
Fixes:
|
||
|
|
c01430cf5b |
dma-mapping: remove bogus test for pfn_valid from dma_map_resource
commit a9c38c5d267cb94871dfa2de5539c92025c855d7 upstream.
dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
However, pfn_valid() only checks for availability of the memory map for a
PFN but it does not ensure that the PFN is actually backed by RAM.
As dma_map_resource() is the only method in DMA mapping APIs that has this
check, simply drop the pfn_valid() test from dma_map_resource().
Link: https://lore.kernel.org/all/20210824173741.GC623@arm.com/
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20210930013039.11260-2-rppt@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Fixes:
|
||
|
|
4fac540897 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: (471 commits) ANDROID: usb: add EXPORT_TRACE_SYMBOL to export tracepoint Revert "gpio: Restrict usage of GPIO chip irq members before initialization" Revert "opp: Expose of-node's name in debugfs" ANDROID: GKI: remove CONFIG_UBSAN_OBJECT_SIZE from gki_defconfig Linux 5.15.35 ax25: Fix UAF bugs in ax25 timers ax25: Fix NULL pointer dereferences in ax25 timers ax25: fix NPD bug in ax25_disconnect ax25: fix UAF bug in ax25_send_control() ax25: Fix refcount leaks caused by ax25_cb_del() ax25: fix UAF bugs of net_device caused by rebinding operation ax25: fix reference count leaks of ax25_dev ax25: add refcount in ax25_dev to avoid UAF bugs cpufreq: intel_pstate: ITMT support for overclocked system net: ipa: fix a build dependency soc: qcom: aoss: Fix missing put_device call in qmp_get cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL dma-direct: avoid redundant memory sync for swiotlb timers: Fix warning condition in __run_timers() ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I44c7de3b006aca8e2bf7653fbac1c875aaa80b99 |
||
|
|
ec1a28c7c0 |
Merge 5.15.35 into android13-5.15
Changes in 5.15.35
drm/amd/display: Add pstate verification and recovery for DCN31
drm/amd/display: Fix p-state allow debug index on dcn31
hamradio: defer 6pack kfree after unregister_netdev
hamradio: remove needs_free_netdev to avoid UAF
cpuidle: PSCI: Move the `has_lpi` check to the beginning of the function
ACPI: processor idle: Check for architectural support for LPI
ACPI: processor idle: Allow playing dead in C3 state
ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40
btrfs: remove unused parameter nr_pages in add_ra_bio_pages()
btrfs: remove no longer used counter when reading data page
btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
soc: qcom: aoss: Expose send for generic usecase
dt-bindings: net: qcom,ipa: add optional qcom,qmp property
net: ipa: request IPA register values be retained
btrfs: release correct delalloc amount in direct IO write path
ALSA: core: Add snd_card_free_on_error() helper
ALSA: sis7019: Fix the missing error handling
ALSA: ali5451: Fix the missing snd_card_free() call at probe error
ALSA: als300: Fix the missing snd_card_free() call at probe error
ALSA: als4000: Fix the missing snd_card_free() call at probe error
ALSA: atiixp: Fix the missing snd_card_free() call at probe error
ALSA: au88x0: Fix the missing snd_card_free() call at probe error
ALSA: aw2: Fix the missing snd_card_free() call at probe error
ALSA: azt3328: Fix the missing snd_card_free() call at probe error
ALSA: bt87x: Fix the missing snd_card_free() call at probe error
ALSA: ca0106: Fix the missing snd_card_free() call at probe error
ALSA: cmipci: Fix the missing snd_card_free() call at probe error
ALSA: cs4281: Fix the missing snd_card_free() call at probe error
ALSA: cs5535audio: Fix the missing snd_card_free() call at probe error
ALSA: echoaudio: Fix the missing snd_card_free() call at probe error
ALSA: emu10k1x: Fix the missing snd_card_free() call at probe error
ALSA: ens137x: Fix the missing snd_card_free() call at probe error
ALSA: es1938: Fix the missing snd_card_free() call at probe error
ALSA: es1968: Fix the missing snd_card_free() call at probe error
ALSA: fm801: Fix the missing snd_card_free() call at probe error
ALSA: galaxy: Fix the missing snd_card_free() call at probe error
ALSA: hdsp: Fix the missing snd_card_free() call at probe error
ALSA: hdspm: Fix the missing snd_card_free() call at probe error
ALSA: ice1724: Fix the missing snd_card_free() call at probe error
ALSA: intel8x0: Fix the missing snd_card_free() call at probe error
ALSA: intel_hdmi: Fix the missing snd_card_free() call at probe error
ALSA: korg1212: Fix the missing snd_card_free() call at probe error
ALSA: lola: Fix the missing snd_card_free() call at probe error
ALSA: lx6464es: Fix the missing snd_card_free() call at probe error
ALSA: maestro3: Fix the missing snd_card_free() call at probe error
ALSA: oxygen: Fix the missing snd_card_free() call at probe error
ALSA: riptide: Fix the missing snd_card_free() call at probe error
ALSA: rme32: Fix the missing snd_card_free() call at probe error
ALSA: rme9652: Fix the missing snd_card_free() call at probe error
ALSA: rme96: Fix the missing snd_card_free() call at probe error
ALSA: sc6000: Fix the missing snd_card_free() call at probe error
ALSA: sonicvibes: Fix the missing snd_card_free() call at probe error
ALSA: via82xx: Fix the missing snd_card_free() call at probe error
ALSA: usb-audio: Cap upper limits of buffer/period bytes for implicit fb
ALSA: nm256: Don't call card private_free at probe error path
drm/msm: Add missing put_task_struct() in debugfs path
firmware: arm_scmi: Remove clear channel call on the TX channel
memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
Revert "ath11k: mesh: add support for 256 bitmap in blockack frames in 11ax"
firmware: arm_scmi: Fix sorting of retrieved clock rates
media: rockchip/rga: do proper error checking in probe
SUNRPC: Fix the svc_deferred_event trace class
net/sched: flower: fix parsing of ethertype following VLAN header
veth: Ensure eth header is in skb's linear part
gpiolib: acpi: use correct format characters
cifs: release cached dentries only if mount is complete
net: mdio: don't defer probe forever if PHY IRQ provider is missing
mlxsw: i2c: Fix initialization error flow
net/sched: fix initialization order when updating chain 0 head
net: dsa: felix: suppress -EPROBE_DEFER errors
net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
net/sched: taprio: Check if socket flags are valid
cfg80211: hold bss_lock while updating nontrans_list
netfilter: nft_socket: make cgroup match work in input too
drm/msm: Fix range size vs end confusion
drm/msm/dsi: Use connector directly in msm_dsi_manager_connector_init()
drm/msm/dp: add fail safe mode outside of event_mutex context
net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()
scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
scsi: pm80xx: Enable upper inbound, outbound queues
scsi: iscsi: Move iscsi_ep_disconnect()
scsi: iscsi: Fix offload conn cleanup when iscsid restarts
scsi: iscsi: Fix endpoint reuse regression
scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
scsi: iscsi: Fix unbound endpoint error handling
sctp: Initialize daddr on peeled off socket
netfilter: nf_tables: nft_parse_register can return a negative value
ALSA: ad1889: Fix the missing snd_card_free() call at probe error
ALSA: mtpav: Don't call card private_free at probe error path
io_uring: move io_uring_rsrc_update2 validation
io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
io_uring: verify pad field is 0 in io_get_ext_arg
testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
ALSA: usb-audio: Increase max buffer size
ALSA: usb-audio: Limit max buffer and period sizes per time
perf tools: Fix misleading add event PMU debug message
macvlan: Fix leaking skb in source mode with nodst option
net: ftgmac100: access hardware register after clock ready
nfc: nci: add flush_workqueue to prevent uaf
cifs: potential buffer overflow in handling symlinks
dm mpath: only use ktime_get_ns() in historical selector
vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used
net: bcmgenet: Revert "Use stronger register read/writes to assure ordering"
block: fix offset/size check in bio_trim()
drm/amd: Add USBC connector ID
btrfs: fix fallocate to use file_modified to update permissions consistently
btrfs: do not warn for free space inode in cow_file_range
drm/amdgpu: conduct a proper cleanup of PDB bo
drm/amdgpu/gmc: use PCI BARs for APUs in passthrough
drm/amd/display: fix audio format not updated after edid updated
drm/amd/display: FEC check in timing validation
drm/amd/display: Update VTEM Infopacket definition
drm/amdkfd: Fix Incorrect VMIDs passed to HWS
drm/amdgpu/vcn: improve vcn dpg stop procedure
drm/amdkfd: Check for potential null return of kmalloc_array()
Drivers: hv: vmbus: Deactivate sysctl_record_panic_msg by default in isolated guests
PCI: hv: Propagate coherence from VMbus device to PCI device
Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
scsi: target: tcmu: Fix possible page UAF
scsi: lpfc: Fix queue failures when recovering from PCI parity error
scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
net: micrel: fix KS8851_MLL Kconfig
ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
gpu: ipu-v3: Fix dev_dbg frequency output
regulator: wm8994: Add an off-on delay for WM8994 variant
arm64: alternatives: mark patch_alternative() as `noinstr`
tlb: hugetlb: Add more sizes to tlb_remove_huge_tlb_entry
net: axienet: setup mdio unconditionally
Drivers: hv: balloon: Disable balloon and hot-add accordingly
net: usb: aqc111: Fix out-of-bounds accesses in RX fixup
myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
spi: cadence-quadspi: fix protocol setup for non-1-1-X operations
drm/amd/display: Enable power gating before init_pipes
drm/amd/display: Revert FEC check in validation
drm/amd/display: Fix allocate_mst_payload assert on resume
drbd: set QUEUE_FLAG_STABLE_WRITES
scsi: mpt3sas: Fail reset operation if config request timed out
scsi: mvsas: Add PCI ID of RocketRaid 2640
scsi: megaraid_sas: Target with invalid LUN ID is deleted during scan
drivers: net: slip: fix NPD bug in sl_tx_timeout()
io_uring: zero tag on rsrc removal
io_uring: use nospec annotation for more indexes
perf/imx_ddr: Fix undefined behavior due to shift overflowing the constant
mm/secretmem: fix panic when growing a memfd_secret
mm, page_alloc: fix build_zonerefs_node()
mm: fix unexpected zeroed page mapping with zram swap
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded
SUNRPC: Fix NFSD's request deferral on RDMA transports
memory: renesas-rpc-if: fix platform-device leak in error path
gcc-plugins: latent_entropy: use /dev/urandom
cifs: verify that tcon is valid before dereference in cifs_kill_sb
ath9k: Properly clear TX status area before reporting to mac80211
ath9k: Fix usage of driver-private space in tx_info
btrfs: fix root ref counts in error handling in btrfs_get_root_ref
btrfs: mark resumed async balance as writing
ALSA: hda/realtek: Add quirk for Clevo PD50PNT
ALSA: hda/realtek: add quirk for Lenovo Thinkpad X12 speakers
ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
nl80211: correctly check NL80211_ATTR_REG_ALPHA2 size
ipv6: fix panic when forwarding a pkt with no in6 dev
drm/amd/display: don't ignore alpha property on pre-multiplied mode
drm/amdgpu: Enable gfxoff quirk on MacBook Pro
x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
x86/tsx: Disable TSX development mode at boot
genirq/affinity: Consider that CPUs on nodes can be unbalanced
tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
ARM: davinci: da850-evm: Avoid NULL pointer dereference
dm integrity: fix memory corruption when tag_size is less than digest size
i2c: dev: check return value when calling dev_set_name()
smp: Fix offline cpu check in flush_smp_call_function_queue()
i2c: pasemi: Wait for write xfers to finish
dt-bindings: net: snps: remove duplicate name
timers: Fix warning condition in __run_timers()
dma-direct: avoid redundant memory sync for swiotlb
drm/i915: Sunset igpu legacy mmap support based on GRAPHICS_VER_FULL
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
soc: qcom: aoss: Fix missing put_device call in qmp_get
net: ipa: fix a build dependency
cpufreq: intel_pstate: ITMT support for overclocked system
ax25: add refcount in ax25_dev to avoid UAF bugs
ax25: fix reference count leaks of ax25_dev
ax25: fix UAF bugs of net_device caused by rebinding operation
ax25: Fix refcount leaks caused by ax25_cb_del()
ax25: fix UAF bug in ax25_send_control()
ax25: fix NPD bug in ax25_disconnect
ax25: Fix NULL pointer dereferences in ax25 timers
ax25: Fix UAF bugs in ax25 timers
Linux 5.15.35
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I0dd9eaea7f977df42b0a5b9cb9043c879f62718b
|
||
|
|
33f5d1daec |
Merge 5.15.34 into android13-5.15
Changes in 5.15.34
lib/logic_iomem: correct fallback config references
um: fix and optimize xor select template for CONFIG64 and timetravel mode
rtc: wm8350: Handle error for wm8350_register_irq
nbd: add error handling support for add_disk()
nbd: Fix incorrect error handle when first_minor is illegal in nbd_dev_add
nbd: Fix hungtask when nbd_config_put
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
kfence: count unexpectedly skipped allocations
kfence: move saving stack trace of allocations into __kfence_alloc()
kfence: limit currently covered allocations when pool nearly full
KVM: x86/pmu: Use different raw event masks for AMD and Intel
KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode()
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
KVM: x86/pmu: Fix and isolate TSX-specific performance event logic
KVM: x86/emulator: Emulate RDPID only if it is enabled in guest
drm: Add orientation quirk for GPD Win Max
ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
drm/amd/display: Add signal type check when verify stream backends same
drm/amd/amdgpu/amdgpu_cs: fix refcount leak of a dma_fence obj
drm/amd/display: Fix memory leak
drm/amd/display: Use PSR version selected during set_psr_caps
usb: gadget: tegra-xudc: Do not program SPARAM
usb: gadget: tegra-xudc: Fix control endpoint's definitions
usb: cdnsp: fix cdnsp_decode_trb function to properly handle ret value
ptp: replace snprintf with sysfs_emit
drm/amdkfd: Don't take process mutex for svm ioctls
powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
ath11k: fix kernel panic during unload/load ath11k modules
ath11k: pci: fix crash on suspend if board file is not found
ath11k: mhi: use mhi_sync_power_up()
net/smc: Send directly when TCP_CORK is cleared
drm/bridge: Add missing pm_runtime_put_sync
bpf: Make dst_port field in struct bpf_sock 16-bit wide
scsi: mvsas: Replace snprintf() with sysfs_emit()
scsi: bfa: Replace snprintf() with sysfs_emit()
drm/v3d: fix missing unlock
power: supply: axp20x_battery: properly report current when discharging
mt76: mt7921: fix crash when startup fails.
mt76: dma: initialize skip_unmap in mt76_dma_rx_fill
cfg80211: don't add non transmitted BSS to 6GHz scanned channels
libbpf: Fix build issue with llvm-readelf
ipv6: make mc_forwarding atomic
net: initialize init_net earlier
powerpc: Set crashkernel offset to mid of RMA region
drm/amdgpu: Fix recursive locking warning
scsi: smartpqi: Fix kdump issue when controller is locked up
PCI: aardvark: Fix support for MSI interrupts
iommu/arm-smmu-v3: fix event handling soft lockup
usb: ehci: add pci device support for Aspeed platforms
PCI: endpoint: Fix alignment fault error in copy tests
tcp: Don't acquire inet_listen_hashbucket::lock with disabled BH.
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
scsi: mpi3mr: Fix reporting of actual data transfer size
scsi: mpi3mr: Fix memory leaks
powerpc/set_memory: Avoid spinlock recursion in change_page_attr()
power: supply: axp288-charger: Set Vhold to 4.4V
net/mlx5e: Disable TX queues before registering the netdev
usb: dwc3: pci: Set the swnode from inside dwc3_pci_quirks()
iwlwifi: mvm: Correctly set fragmented EBS
iwlwifi: mvm: move only to an enabled channel
drm/msm/dsi: Remove spurious IRQF_ONESHOT flag
ipv4: Invalidate neighbour for broadcast address upon address addition
dm ioctl: prevent potential spectre v1 gadget
dm: requeue IO if mapping table not yet available
drm/amdkfd: make CRAT table missing message informational only
vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA
scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
scsi: pm8001: Fix task leak in pm8001_send_abort_all()
scsi: pm8001: Fix tag leaks on error
scsi: pm8001: Fix memory leak in pm8001_chip_fw_flash_update_req()
mt76: mt7915: fix injected MPDU transmission to not use HW A-MSDU
powerpc/64s/hash: Make hash faults work in NMI context
mt76: mt7615: Fix assigning negative values to unsigned variable
scsi: aha152x: Fix aha152x_setup() __setup handler return value
scsi: hisi_sas: Free irq vectors in order for v3 HW
scsi: hisi_sas: Limit users changing debugfs BIST count value
net/smc: correct settings of RMB window update limit
mips: ralink: fix a refcount leak in ill_acc_of_setup()
macvtap: advertise link netns via netlink
tuntap: add sanity checks about msg_controllen in sendmsg
Bluetooth: Fix not checking for valid hdev on bt_dev_{info,warn,err,dbg}
Bluetooth: use memset avoid memory leaks
bnxt_en: Eliminate unintended link toggle during FW reset
PCI: endpoint: Fix misused goto label
MIPS: fix fortify panic when copying asm exception handlers
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc/secvar: fix refcount leak in format_show()
scsi: libfc: Fix use after free in fc_exch_abts_resp()
can: isotp: set default value for N_As to 50 micro seconds
can: etas_es58x: es58x_fd_rx_event_msg(): initialize rx_event_msg before calling es58x_check_msg_len()
riscv: Fixed misaligned memory access. Fixed pointer comparison.
net: account alternate interface name memory
net: limit altnames to 64k total
net/mlx5e: Remove overzealous validations in netlink EEPROM query
net: sfp: add 2500base-X quirk for Lantech SFP module
usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
mt76: fix monitor mode crash with sdio driver
xtensa: fix DTC warning unit_address_format
MIPS: ingenic: correct unit node address
Bluetooth: Fix use after free in hci_send_acl
netfilter: conntrack: revisit gc autotuning
netlabel: fix out-of-bounds memory accesses
ceph: fix inode reference leakage in ceph_get_snapdir()
ceph: fix memory leak in ceph_readdir when note_last_dentry returns error
lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option
init/main.c: return 1 from handled __setup() functions
minix: fix bug when opening a file with O_DIRECT
clk: si5341: fix reported clk_rate when output divider is 2
staging: vchiq_arm: Avoid NULL ptr deref in vchiq_dump_platform_instances
staging: vchiq_core: handle NULL result of find_service_by_handle
phy: amlogic: phy-meson-gxl-usb2: fix shared reset controller use
phy: amlogic: meson8b-usb2: Use dev_err_probe()
phy: amlogic: meson8b-usb2: fix shared reset control use
clk: rockchip: drop CLK_SET_RATE_PARENT from dclk_vop* on rk3568
cpufreq: CPPC: Fix performance/frequency conversion
opp: Expose of-node's name in debugfs
staging: wfx: fix an error handling in wfx_init_common()
w1: w1_therm: fixes w1_seq for ds28ea00 sensors
NFSv4.2: fix reference count leaks in _nfs42_proc_copy_notify()
NFSv4: Protect the state recovery thread against direct reclaim
habanalabs: fix possible memory leak in MMU DR fini
xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
clk: ti: Preserve node in ti_dt_clocks_register()
clk: Enforce that disjoints limits are invalid
SUNRPC/call_alloc: async tasks mustn't block waiting for memory
SUNRPC/xprt: async tasks mustn't block waiting for memory
SUNRPC: remove scheduling boost for "SWAPPER" tasks.
NFS: swap IO handling is slightly different for O_DIRECT IO
NFS: swap-out must always use STABLE writes.
x86: Annotate call_on_stack()
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
virtio_console: eliminate anonymous module_init & module_exit
jfs: prevent NULL deref in diFree
SUNRPC: Fix socket waits for write buffer space
NFS: nfsiod should not block forever in mempool_alloc()
NFS: Avoid writeback threads getting stuck in mempool_alloc()
selftests: net: Add tls config dependency for tls selftests
parisc: Fix CPU affinity for Lasi, WAX and Dino chips
parisc: Fix patch code locking and flushing
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
rtc: mc146818-lib: change return values of mc146818_get_time()
rtc: Check return value from mc146818_get_time()
rtc: mc146818-lib: fix RTC presence check
drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
Drivers: hv: vmbus: Fix potential crash on module unload
Revert "NFSv4: Handle the special Linux file open access mode"
NFSv4: fix open failure with O_ACCMODE flag
scsi: sr: Fix typo in CDROM(CLOSETRAY|EJECT) handling
scsi: core: Fix sbitmap depth in scsi_realloc_sdev_budget_map()
scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
vdpa/mlx5: Rename control VQ workqueue to vdpa wq
vdpa/mlx5: Propagate link status from device to vdpa driver
vdpa: mlx5: prevent cvq work from hogging CPU
net: sfc: add missing xdp queue reinitialization
net/tls: fix slab-out-of-bounds bug in decrypt_internal
vrf: fix packet sniffing for traffic originating from ip tunnels
skbuff: fix coalescing for page_pool fragment recycling
ice: Clear default forwarding VSI during VSI release
mctp: Fix check for dev_hard_header() result
net: ipv4: fix route with nexthop object delete warning
net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
drm/imx: imx-ldb: Check for null pointer after calling kmemdup
drm/imx: Fix memory leak in imx_pd_connector_get_modes
drm/imx: dw_hdmi-imx: Fix bailout in error cases of probe
regulator: rtq2134: Fix missing active_discharge_on setting
regulator: atc260x: Fix missing active_discharge_on setting
arch/arm64: Fix topology initialization for core scheduling
bnxt_en: Synchronize tx when xdp redirects happen on same ring
bnxt_en: reserve space inside receive page for skb_shared_info
bnxt_en: Prevent XDP redirect from running when stopping TX queue
sfc: Do not free an empty page_ring
RDMA/mlx5: Don't remove cache MRs when a delay is needed
RDMA/mlx5: Add a missing update of cache->last_add
IB/cm: Cancel mad on the DREQ event when the state is MRA_REP_RCVD
IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition
sctp: count singleton chunks in assoc user stats
dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe
ice: Set txq_teid to ICE_INVAL_TEID on ring creation
ice: Do not skip not enabled queues in ice_vc_dis_qs_msg
ipv6: Fix stats accounting in ip6_pkt_drop
ice: synchronize_rcu() when terminating rings
ice: xsk: fix VSI state check in ice_xsk_wakeup()
net: openvswitch: don't send internal clone attribute to the userspace.
net: ethernet: mv643xx: Fix over zealous checking of_get_mac_address()
net: openvswitch: fix leak of nested actions
rxrpc: fix a race in rxrpc_exit_net()
net: sfc: fix using uninitialized xdp tx_queue
net: phy: mscc-miim: reject clause 45 register accesses
qede: confirm skb is allocated before using
spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op()
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
drbd: Fix five use after free bugs in get_initial_state
scsi: ufs: ufshpb: Fix a NULL check on list iterator
io_uring: nospec index for tags on files update
io_uring: don't touch scm_fp_list after queueing skb
SUNRPC: Handle ENOMEM in call_transmit_status()
SUNRPC: Handle low memory situations in call_status()
SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
iommu/omap: Fix regression in probe for NULL pointer dereference
perf: arm-spe: Fix perf report --mem-mode
perf tools: Fix perf's libperf_print callback
perf session: Remap buf if there is no space for event
arm64: Add part number for Arm Cortex-A78AE
scsi: mpt3sas: Fix use after free in _scsih_expander_node_remove()
scsi: ufs: ufs-pci: Add support for Intel MTL
Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
mmc: block: Check for errors after write on SPI
mmc: mmci: stm32: correctly check all elements of sg list
mmc: renesas_sdhi: don't overwrite TAP settings when HS400 tuning is complete
mmc: core: Fixup support for writeback-cache for eMMC and SD
lz4: fix LZ4_decompress_safe_partial read out of bound
highmem: fix checks in __kmap_local_sched_{in,out}
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
mm/mempolicy: fix mpol_new leak in shared_policy_replace
io_uring: don't check req->file in io_fsync_prep()
io_uring: defer splice/tee file validity check until command issue
io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
io_uring: fix race between timeout flush and removal
x86/pm: Save the MSR validity status at context setup
x86/speculation: Restore speculation related MSRs during S3 resume
perf/x86/intel: Update the FRONTEND MSR mask on Sapphire Rapids
btrfs: fix qgroup reserve overflow the qgroup limit
btrfs: prevent subvol with swapfile from being deleted
spi: core: add dma_map_dev for __spi_unmap_msg()
arm64: patch_text: Fixup last cpu should be master
RDMA/hfi1: Fix use-after-free bug for mm struct
gpio: Restrict usage of GPIO chip irq members before initialization
x86/msi: Fix msi message data shadow struct
x86/mm/tlb: Revert retpoline avoidance approach
perf/x86/intel: Don't extend the pseudo-encoding to GP counters
ata: sata_dwc_460ex: Fix crash due to OOB write
perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
perf/core: Inherit event_caps
irqchip/gic-v3: Fix GICR_CTLR.RWP polling
fbdev: Fix unregistering of framebuffers without device
amd/display: set backlight only if required
SUNRPC: Prevent immediate close+reconnect
drm/panel: ili9341: fix optional regulator handling
drm/amdgpu/display: change pipe policy for DCN 2.1
drm/amdgpu/smu10: fix SoC/fclk units in auto mode
drm/amdgpu/vcn: Fix the register setting for vcn1
drm/nouveau/pmu: Add missing callbacks for Tegra devices
drm/amdkfd: Create file descriptor after client is added to smi_clients list
drm/amdgpu: don't use BACO for reset in S3
KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255
net/smc: send directly on setting TCP_NODELAY
Revert "selftests: net: Add tls config dependency for tls selftests"
bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
selftests/bpf: Fix u8 narrow load checks for bpf_sk_lookup remote_port
rtc: mc146818-lib: fix signedness bug in mc146818_get_time()
SUNRPC: Don't call connect() more than once on a TCP socket
Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
perf python: Fix probing for some clang command line options
tools build: Filter out options and warnings not supported by clang
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
Revert "net/mlx5: Accept devlink user input after driver initialization complete"
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644
selftests: cgroup: Test open-time credential usage for migration checks
selftests: cgroup: Test open-time cgroup namespace usage for migration checks
mm: don't skip swap entry even if zap_details specified
Drivers: hv: vmbus: Replace smp_store_mb() with virt_store_mb()
x86/bug: Prevent shadowing in __WARN_FLAGS
sched: Teach the forced-newidle balancer about CPU affinity limitation.
x86,static_call: Fix __static_call_return0 for i386
irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling
powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S
irqchip/gic, gic-v3: Prevent GSI to SGI translations
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
static_call: Don't make __static_call_return0 static
powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
stacktrace: move filter_irq_stacks() to kernel/stacktrace.c
Linux 5.15.34
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I98049d0d8ebd427296418d31085bfde482ad30e7
|
||
|
|
3bb5834cb6 |
Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15: (946 commits) ANDROID: Suppress build.sh deprecation warnings. ANDROID: GKI: 4/20/2022 KMI update ANDROID: update is_cpu_allowed hook prototype UPSTREAM: scsi: ufs: mediatek: Support vops pre suspend to disable auto-hibern8 ANDROID: USB: Add vendor specified variables to phy.h ANDROID: GKI: build multi-gen LRU FROMLIST: mm: multi-gen LRU: design doc FROMLIST: mm: multi-gen LRU: admin guide FROMLIST: mm: multi-gen LRU: debugfs interface FROMLIST: mm: multi-gen LRU: thrashing prevention FROMLIST: mm: multi-gen LRU: kill switch FROMLIST: mm: multi-gen LRU: optimize multiple memcgs FROMLIST: mm: multi-gen LRU: support page table walks FROMLIST: mm: multi-gen LRU: exploit locality in rmap FROMLIST: mm: multi-gen LRU: minimal implementation FROMLIST: mm: multi-gen LRU: groundwork FROMLIST: Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" FROMLIST: mm/vmscan.c: refactor shrink_node() FROMLIST: mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG FROMLIST: mm: x86, arm64: add arch_has_hw_pte_young() ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I1a76e8a01e786fad9f730637de92a88f2f5afb95 |
||
|
|
457758d86c |
ANDROID: update is_cpu_allowed hook prototype
Vendor module needs access to the task structure through is_cpu_allowed tracehook to better assess whether the chosen cpu can be allowed. Update the existing is_cpu_allowed tracehook to pass the task struct of the task currently being handled. Bug: 228392842 Change-Id: I882b593ccb77da0755e076c7e636db304ee74b42 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
|
76f7f07cbf |
FROMLIST: mm: multi-gen LRU: kill switch
Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that
can be disabled include:
0x0001: the multi-gen LRU core
0x0002: walking page table, when arch_has_hw_pte_young() returns
true
0x0004: clearing the accessed bit in non-leaf PMD entries, when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y
[yYnN]: apply to all the components above
E.g.,
echo y >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0007
echo 5 >/sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled
0x0005
NB: the page table walks happen on the scale of seconds under heavy
memory pressure, in which case the mmap_lock contention is a lesser
concern, compared with the LRU lock contention and the I/O congestion.
So far the only well-known case of the mmap_lock contention happens on
Android, due to Scudo [1] which allocates several thousand VMAs for
merely a few hundred MBs. The SPF and the Maple Tree also have
provided their own assessments [2][3]. However, if walking page tables
does worsen the mmap_lock contention, the kill switch can be used to
disable it. In this case the multi-gen LRU will suffer a minor
performance degradation, as shown previously.
Clearing the accessed bit in non-leaf PMD entries can also be
disabled, since this behavior was not tested on x86 varieties other
than Intel and AMD.
[1] https://source.android.com/devices/tech/debug/scudo
[2] https://lore.kernel.org/lkml/20220128131006.67712-1-michel@lespinasse.org/
[3] https://lore.kernel.org/lkml/20220202024137.2516438-1-Liam.Howlett@oracle.com/
Link: https://lore.kernel.org/lkml/20220309021230.721028-11-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I71801d9470a2588cad8bfd14fbcfafc7b010aa03
|
||
|
|
5280d76d38 |
FROMLIST: mm: multi-gen LRU: support page table walks
To further exploit spatial locality, the aging prefers to walk page
tables to search for young PTEs and promote hot pages. A kill switch
will be added in the next patch to disable this behavior. When
disabled, the aging relies on the rmap only.
NB: this behavior has nothing similar with the page table scanning in
the 2.4 kernel [1], which searches page tables for old PTEs, adds cold
pages to swapcache and unmaps them.
To avoid confusion, the term "iteration" specifically means the
traversal of an entire mm_struct list; the term "walk" will be applied
to page tables and the rmap, as usual.
An mm_struct list is maintained for each memcg, and an mm_struct
follows its owner task to the new memcg when this task is migrated.
Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls
walk_page_range() with each mm_struct on this list to promote hot
pages before it increments max_seq.
When multiple page table walkers iterate the same list, each of them
gets a unique mm_struct; therefore they can run concurrently. Page
table walkers ignore any misplaced pages, e.g., if an mm_struct was
migrated, pages it left in the previous memcg will not be promoted
when its current memcg is under reclaim. Similarly, page table walkers
will not promote pages from nodes other than the one under reclaim.
This patch uses the following optimizations when walking page tables:
1. It tracks the usage of mm_struct's between context switches so that
page table walkers can skip processes that have been sleeping since
the last iteration.
2. It uses generational Bloom filters to record populated branches so
that page table walkers can reduce their search space based on the
query results, e.g., to skip page tables containing mostly holes or
misplaced pages.
3. It takes advantage of the accessed bit in non-leaf PMD entries when
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y.
4. It does not zigzag between a PGD table and the same PMD table
spanning multiple VMAs. IOW, it finishes all the VMAs within the
range of the same PMD table before it returns to a PGD table. This
improves the cache performance for workloads that have large
numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5.
Server benchmark results:
Single workload:
fio (buffered I/O): no change
Single workload:
memcached (anon): +[5.5, 7.5]%
Ops/sec KB/sec
patch1-7: 1014393.57 39455.42
patch1-8: 1078507.59 41949.15
Configurations:
no change
Client benchmark results:
kswapd profiles:
patch1-7
45.54% lzo1x_1_do_compress (real work)
9.56% page_vma_mapped_walk
6.70% _raw_spin_unlock_irq
2.78% ptep_clear_flush
2.47% do_raw_spin_lock
2.22% __zram_bvec_write
1.87% lru_gen_look_around
1.78% memmove
1.77% obj_malloc
1.44% free_unref_page_list
patch1-8
47.02% lzo1x_1_do_compress (real work)
6.73% page_vma_mapped_walk
6.14% _raw_spin_unlock_irq
3.39% walk_pte_range
2.63% ptep_clear_flush
2.29% __zram_bvec_write
2.10% do_raw_spin_lock
1.81% memmove
1.73% obj_malloc
1.53% free_unref_page_list
Configurations:
no change
[1] https://lwn.net/Articles/23732/
[2] https://source.android.com/devices/tech/debug/scudo
Link: https://lore.kernel.org/lkml/20220309021230.721028-9-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I5a3c97cf8ebf8d65d5f9528cd979a637c190053e
|
||
|
|
a1537a68c5 |
FROMLIST: mm: multi-gen LRU: minimal implementation
To avoid confusion, the terms "promotion" and "demotion" will be
applied to the multi-gen LRU, as a new convention; the terms
"activation" and "deactivation" will be applied to the active/inactive
LRU, as usual.
The aging produces young generations. Given an lruvec, it increments
max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging
promotes hot pages to the youngest generation when it finds them
accessed through page tables; the demotion of cold pages happens
consequently when it increments max_seq. The aging has the complexity
O(nr_hot_pages), since it is only interested in hot pages. Promotion
in the aging path does not require any LRU list operations, only the
updates of the gen counter and lrugen->nr_pages[]; demotion, unless as
the result of the increment of max_seq, requires LRU list operations,
e.g., lru_deactivate_fn().
The eviction consumes old generations. Given an lruvec, it increments
min_seq when the lists indexed by min_seq%MAX_NR_GENS become empty. A
feedback loop modeled after the PID controller monitors refaults over
anon and file types and decides which type to evict when both types
are available from the same generation.
Each generation is divided into multiple tiers. Tiers represent
different ranges of numbers of accesses through file descriptors. A
page accessed N times through file descriptors is in tier
order_base_2(N). Tiers do not have dedicated lrugen->lists[], only
bits in page->flags. In contrast to moving across generations, which
requires the LRU lock, moving across tiers only involves operations on
page->flags. The feedback loop also monitors refaults over all tiers
and decides when to protect pages in which tiers (N>1), using the
first tier (N=0,1) as a baseline. The first tier contains single-use
unmapped clean pages, which are most likely the best choices. The
eviction moves a page to the next generation, i.e., min_seq+1, if the
feedback loop decides so. This approach has the following advantages:
1. It removes the cost of activation in the buffered access path by
inferring whether pages accessed multiple times through file
descriptors are statistically hot and thus worth protecting in the
eviction path.
2. It takes pages accessed through page tables into account and avoids
overprotecting pages accessed multiple times through file
descriptors. (Pages accessed through page tables are in the first
tier, since N=0.)
3. More tiers provide better protection for pages accessed more than
twice through file descriptors, when under heavy buffered I/O
workloads.
Server benchmark results:
Single workload:
fio (buffered I/O): +[38, 40]%
IOPS BW
5.18-ed4643521e6a: 2547k 9989MiB/s
patch1-6: 3540k 13.5GiB/s
Single workload:
memcached (anon): +[103, 107]%
Ops/sec KB/sec
5.18-ed4643521e6a: 469048.66 18243.91
patch1-6: 964656.80 37520.88
Configurations:
CPU: two Xeon 6154
Mem: total 256G
Node 1 was only used as a ram disk to reduce the variance in the
results.
patch drivers/block/brd.c <<EOF
99,100c99,100
< gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM;
< page = alloc_page(gfp_flags);
---
> gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE;
> page = alloc_pages_node(1, gfp_flags, 0);
EOF
cat >>/etc/systemd/system.conf <<EOF
CPUAffinity=numa
NUMAPolicy=bind
NUMAMask=0
EOF
cat >>/etc/memcached.conf <<EOF
-m 184320
-s /var/run/memcached/memcached.sock
-a 0766
-t 36
-B binary
EOF
cat fio.sh
modprobe brd rd_nr=1 rd_size=113246208
swapoff -a
mkfs.ext4 /dev/ram0
mount -t ext4 /dev/ram0 /mnt
mkdir /sys/fs/cgroup/user.slice/test
echo 38654705664 >/sys/fs/cgroup/user.slice/test/memory.max
echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs
fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \
--buffered=1 --ioengine=io_uring --iodepth=128 \
--iodepth_batch_submit=32 --iodepth_batch_complete=32 \
--rw=randread --random_distribution=random --norandommap \
--time_based --ramp_time=10m --runtime=5m --group_reporting
cat memcached.sh
modprobe brd rd_nr=1 rd_size=113246208
swapoff -a
mkswap /dev/ram0
swapon /dev/ram0
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \
--ratio 1:0 --pipeline 8 -d 2000
memtier_benchmark -S /var/run/memcached/memcached.sock \
-P memcache_binary -n allkeys --key-minimum=1 \
--key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \
--ratio 0:1 --pipeline 8 --randomize --distinct-client-seed
Client benchmark results:
kswapd profiles:
5.18-ed4643521e6a
39.56% page_vma_mapped_walk
19.32% lzo1x_1_do_compress (real work)
7.18% do_raw_spin_lock
4.23% _raw_spin_unlock_irq
2.26% vma_interval_tree_subtree_search
2.12% vma_interval_tree_iter_next
2.11% folio_referenced_one
1.90% anon_vma_interval_tree_iter_first
1.47% ptep_clear_flush
0.97% __anon_vma_interval_tree_subtree_search
patch1-6
36.13% lzo1x_1_do_compress (real work)
19.16% page_vma_mapped_walk
6.55% _raw_spin_unlock_irq
4.02% do_raw_spin_lock
2.32% anon_vma_interval_tree_iter_first
2.11% ptep_clear_flush
1.76% __zram_bvec_write
1.64% folio_referenced_one
1.40% memmove
1.35% obj_malloc
Configurations:
CPU: single Snapdragon 7c
Mem: total 4G
Chrome OS MemoryPressure [1]
[1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/
Link: https://lore.kernel.org/lkml/20220309021230.721028-7-yuzhao@google.com/
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Brian Geffon <bgeffon@google.com>
Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Steven Barrett <steven@liquorix.net>
Acked-by: Suleiman Souhlal <suleiman@google.com>
Tested-by: Daniel Byrne <djbyrne@mtu.edu>
Tested-by: Donald Carr <d@chaos-reins.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: Shuang Zhai <szhai2@cs.rochester.edu>
Tested-by: Sofia Trinh <sofia.trinh@edi.works>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Bug: 227651406
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Change-Id: I3fe4850006d7984cd9f4fd46134b826609dc2f86
|
||
|
|
f88ed5a3d3 |
FROMLIST: mm: multi-gen LRU: groundwork
Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in page->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers. The variable size of the sliding window is designed for such use cases [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking the rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lore.kernel.org/lkml/20220309021230.721028-6-yuzhao@google.com/ Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com> Bug: 227651406 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512 |
||
|
|
d712aea3cd |
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
commit b7ba6d8dc3569e49800ef0136799f26f43e237e8 upstream.
Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in
cpuhp_create() is too late as it is used earlier in _cpu_up().
If kzalloc_node() in __smpboot_create_thread() fails then the rollback will
be done with st->cpu==0 causing CPU0 to be erroneously set to be dying,
causing the scheduler to get mightily confused and throw its toys out of
the pram.
However the cpu number is actually available directly, so simply remove
the 'cpu' member and avoid the problem in the first place.
Fixes:
|
||
|
|
4ef9951d02 |
dma-direct: avoid redundant memory sync for swiotlb
commit 9e02977bfad006af328add9434c8bffa40e053bb upstream.
When we looked into FIO performance with swiotlb enabled in VM, we found
swiotlb_bounce() is always called one more time than expected for each DMA
read request.
It turns out that the bounce buffer is copied to original DMA buffer twice
after the completion of a DMA request (one is done by in
dma_direct_sync_single_for_cpu(), the other by swiotlb_tbl_unmap_single()).
But the content in bounce buffer actually doesn't change between the two
rounds of copy. So, one round of copy is redundant.
Pass DMA_ATTR_SKIP_CPU_SYNC flag to swiotlb_tbl_unmap_single() to
skip the memory copy in it.
This fix increases FIO 64KB sequential read throughput in a guest with
swiotlb=force by 5.6%.
Fixes:
|
||
|
|
111becd63e |
timers: Fix warning condition in __run_timers()
commit c54bc0fc84214b203f7a0ebfd1bd308ce2abe920 upstream.
When the timer base is empty, base::next_expiry is set to base::clk +
NEXT_TIMER_MAX_DELTA and base::next_expiry_recalc is false. When no timer
is queued until jiffies reaches base::next_expiry value, the warning for
not finding any expired timer and base::next_expiry_recalc is false in
__run_timers() triggers.
To prevent triggering the warning in this valid scenario
base::timers_pending needs to be added to the warning condition.
Fixes:
|
||
|
|
44981e4cde |
smp: Fix offline cpu check in flush_smp_call_function_queue()
commit 9e949a3886356fe9112c6f6f34a6e23d1d35407f upstream.
The check in flush_smp_call_function_queue() for callbacks that are sent
to offline CPUs currently checks whether the queue is empty.
However, flush_smp_call_function_queue() has just deleted all the
callbacks from the queue and moved all the entries into a local list.
This checks would only be positive if some callbacks were added in the
short time after llist_del_all() was called. This does not seem to be
the intention of this check.
Change the check to look at the local list to which the entries were
moved instead of the queue from which all the callbacks were just
removed.
Fixes:
|
||
|
|
68a38b07f1 |
tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
commit 40e97e42961f8c6cc7bd5fe67cc18417e02d78f1 upstream.
While running some testing on code that happened to allow the variable
tick_nohz_full_running to get set but with no "possible" NOHZ cores to
back up that setting, this warning triggered:
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
WARN_ON(tick_nohz_full_running);
The console was overwhemled with an endless stream of one WARN per tick
per core and there was no way to even see what was going on w/o using a
serial console to capture it and then trace it back to this.
Change it to WARN_ON_ONCE().
Fixes:
|