kernel_google_redbull

Evolution-X-Devices/kernel_google_redbull

Author	SHA1	Message	Date
Rick Yiu	7a1b8ea81e	sched/fair: use actual cpu capacity to calculate boosted util Currently when calculating boosted util for a cpu, it uses a fixed value of 1024 for calculation. So when top-app tasks moved to LC, which has much lower capacity than BC, the freq calculated will be high even the cpu util is low. This results in higher power consumption, especially on arch which has more little cores than big cores. By replacing the fixed value of 1024 with actual cpu capacity will reduce the freq calculated on LC. Bug: 152925197 Test: boosted util reduced on little cores Signed-off-by: Rick Yiu <rickyiu@google.com> Change-Id: I80cdd08a2c7fa5e674c43bfc132584d85c14622b	2020-06-16 13:20:18 +00:00
Rick Yiu	9ff77c2408	sched: separate capacity margin for boosted tasks With the introduction of placement hint patch, boosted tasks will not scheduled from big cores. We tune capacity margin to let important boosted tasks get scheduled on big cores. However, the capacity margin affects all group of tasks, so that non-boosted tasks get more chances to be scheduled on big cores, too. This could be solved by separating capacity margin for boosted tasks. Bug: 152925197 Test: margin set correctly Signed-off-by: Rick Yiu <rickyiu@google.com> Change-Id: I0e059c56efa9bc8513f0ef4b0f6ab8f5d04a592a	2020-06-16 13:20:00 +00:00
Wei Wang	ac1e356b2a	sched: separate boost signal from placement hint Test: build and boot Bug: 144451857 Bug: 147785606 Bug: 152925197 Change-Id: Ib2d86a72cad12971a99c7105813473211a7fbd76 Signed-off-by: Wei Wang <wvw@google.com>	2020-06-16 13:18:33 +00:00
lucaswei	20f57bd70a	Revert "vmscan: Support multiple kswapd threads per node" This reverts commit `7e78bc0ad2`. Reason for revert: revert vendor customization patch Bug: 157880566 Bug: 157858241 Change-Id: Id3c8f6c950ac01c3e85bea2b8ec0f9d6dce7af42 Signed-off-by: lucaswei <lucaswei@google.com>	2020-06-15 15:39:53 +08:00
lucaswei	56acc710a6	Merge LA.UM.9.12.R2.10.00.00.685.011 via branch 'qcom-msm-4.19-7250' into android-msm-pixel-4.19 Conflicts: Documentation/ABI/testing/sysfs-fs-f2fs Documentation/filesystems/f2fs.txt Documentation/filesystems/fscrypt.rst Documentation/sysctl/vm.txt Makefile arch/arm64/boot/Makefile arch/arm64/configs/vendor/kona_defconfig arch/arm64/configs/vendor/lito_defconfig arch/arm64/kernel/vdso.c arch/arm64/mm/init.c arch/arm64/mm/mmu.c arch/ia64/mm/init.c arch/powerpc/mm/mem.c arch/s390/mm/init.c arch/sh/mm/init.c arch/x86/mm/init_32.c arch/x86/mm/init_64.c block/bio.c block/blk-crypto-fallback.c block/blk-crypto-internal.h block/blk-crypto.c block/blk-merge.c block/keyslot-manager.c build.config.common drivers/base/core.c drivers/base/power/wakeup.c drivers/char/adsprpc.c drivers/char/diag/diagchar_core.c drivers/crypto/Makefile drivers/crypto/msm/qcedev.c drivers/crypto/msm/qcrypto.c drivers/dma-buf/dma-buf.c drivers/input/input.c drivers/input/keycombo.c drivers/input/misc/gpio_input.c drivers/input/misc/gpio_matrix.c drivers/input/touchscreen/st/fts.c drivers/md/Kconfig drivers/md/dm-default-key.c drivers/md/dm.c drivers/mmc/host/Makefile drivers/mmc/host/sdhci-msm-ice.h drivers/net/phy/phy_device.c drivers/of/property.c drivers/pci/controller/pci-msm.c drivers/platform/msm/gsi/Makefile drivers/platform/msm/ipa/ipa_rm_inactivity_timer.c drivers/platform/msm/ipa/ipa_v3/ipa.c drivers/platform/msm/ipa/ipa_v3/ipa_pm.c drivers/platform/msm/ipa/ipa_v3/rmnet_ipa.c drivers/platform/msm/sps/spsi.h drivers/power/supply/qcom/qpnp-smb5.c drivers/power/supply/qcom/smb5-lib.h drivers/power/supply/qcom/step-chg-jeita.c drivers/scsi/ufs/Kconfig drivers/scsi/ufs/Makefile drivers/scsi/ufs/ufs-qcom-ice.c drivers/scsi/ufs/ufs-qcom.c drivers/scsi/ufs/ufs-qcom.h drivers/scsi/ufs/ufshcd-crypto.c drivers/scsi/ufs/ufshcd-crypto.h drivers/scsi/ufs/ufshcd.c drivers/scsi/ufs/ufshcd.h drivers/soc/qcom/Makefile drivers/soc/qcom/msm_bus/msm_bus_dbg.c drivers/soc/qcom/msm_bus/msm_bus_dbg_rpmh.c drivers/soc/qcom/msm_minidump.c drivers/soc/qcom/peripheral-loader.c drivers/soc/qcom/smp2p.c drivers/soc/qcom/smp2p_sleepstate.c drivers/soc/qcom/subsystem_restart.c drivers/spi/spi-geni-qcom.c drivers/thermal/tsens.h drivers/tty/serial/Kconfig drivers/tty/serial/msm_geni_serial.c drivers/usb/typec/tcpm/fusb302.c drivers/usb/typec/tcpm/tcpm.c fs/crypto/bio.c fs/crypto/crypto.c fs/crypto/fname.c fs/crypto/fscrypt_private.h fs/crypto/inline_crypt.c fs/crypto/keyring.c fs/crypto/keysetup.c fs/crypto/keysetup_v1.c fs/crypto/policy.c fs/eventpoll.c fs/ext4/inode.c fs/ext4/ioctl.c fs/ext4/page-io.c fs/ext4/super.c fs/f2fs/Kconfig fs/f2fs/compress.c fs/f2fs/data.c fs/f2fs/dir.c fs/f2fs/f2fs.h fs/f2fs/file.c fs/f2fs/gc.c fs/f2fs/hash.c fs/f2fs/inline.c fs/f2fs/inode.c fs/f2fs/namei.c fs/f2fs/super.c fs/f2fs/sysfs.c fs/ubifs/ioctl.c include/linux/bio-crypt-ctx.h include/linux/bio.h include/linux/blk-crypto.h include/linux/blk_types.h include/linux/fscrypt.h include/linux/gfp.h include/linux/keyslot-manager.h include/linux/memory_hotplug.h include/linux/usb/tcpm.h include/linux/usb/typec.h include/soc/qcom/socinfo.h include/trace/events/f2fs.h include/uapi/linux/fscrypt.h include/uapi/linux/sched/types.h kernel/memremap.c kernel/sched/core.c kernel/sched/cpufreq_schedutil.c kernel/sched/fair.c kernel/sched/psi.c kernel/sched/sched.h kernel/sysctl.c lib/Makefile lib/test_stackinit.c mm/filemap.c mm/hmm.c mm/memory_hotplug.c mm/page_alloc.c scripts/gen_autoksyms.sh Bug: 157994070 Bug: 157858241 Bug: 157879992 Signed-off-by: lucaswei <lucaswei@google.com> Change-Id: Ib43efc6464e484b85107587c2f770246b48ddee6	2020-06-15 15:36:42 +08:00
Rick Yiu	7c722a0cb3	sched/fair: consider boost margin for type FREQUENCY_UTIL When computing energy in selecting task runqueue, it does not use boosted cpu util, so it could not reflect the real freq when a cpu has boosted tasks on it. Addressing it by adding boost margin if type is FREQUENCY_UTIL in schedutil_freq_util(). Bug: 158637636 Test: boot to home Change-Id: I13f4283f03c0962dfc82ca7da01319c98e7aa7bf Signed-off-by: Rick Yiu <rickyiu@google.com>	2020-06-11 07:33:02 +00:00
Martin Liu	3b44513213	sched/psi: add psi trigger event trace add psi trigger events to help observer psi state. Below are the example of the outputs. <...>-577 [000] .... 213.883816: psi_update_trigger_growth: mem_some growth=208414044 threshold=15000000 elapsed=206674740 win=1000000000 last_event_diff=1033535524 <...>-577 [000] .... 213.883821: psi_update_trigger_wake_up: mem_some growth=208414044 threshold=15000000 win=1000000000 last_event_time=212850243878 Bug: 157840940 Test: check trace output Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I0a89eec867c11de518ead9e4844cfb7374c4e25b	2020-06-10 02:48:54 +00:00
Vincenzo Frascino	748ec8be6a	UPSTREAM: timekeeping: Provide a generic update_vsyscall() implementation The new generic VDSO library allows to unify the update_vsyscall[_tz]() implementations. Provide a generic implementation based on the x86 code and the bindings which need to be implemented in architecture specific code. [ tglx: Moved it into kernel/time where it belongs. Removed the pointless line breaks in the stub functions. Massaged changelog ] Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shijith Thotton <sthotton@marvell.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Huw Davies <huw@codeweavers.com> Link: https://lkml.kernel.org/r/20190621095252.32307-4-vincenzo.frascino@arm.com (cherry picked from commit 44f57d788e7deecb504843534081d3449c2eede9) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 154668398 Change-Id: I2a85e391be80f58f6516eb7d8e6448f522fc3013 Signed-off-by: Chiawei Wang <chiaweiwang@google.com>	2020-06-09 17:51:52 +08:00
Martin Liu	52d2c35c9a	Revert "mm: oom_kill: reap memory of a task that receives SIGKILL" This reverts commit `97bf2fb571`. Reason to revert: The changes introduced in this commit are causing an undesirable SELinux denial as a side-effect and we do not enable the functionality that this commit adds. Reverting the commit fixes the SELinux denial bug. Bug: 152624411 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I149b66e6fa0e90e691436e1a83261ff1de233669	2020-06-08 17:26:41 +08:00
Jimmy Shiu	81827827c1	Use find_best_target to select cpu for a zero-util task Always choosing the prev_cpu for a zero-utilization task might lead tasks competing for the same cpu and increase the overall task execution time. Instead, selecting cpu with find_best_target to share the loading onto other cpus. Bug: 143857473 Test: https://paste.googleplex.com/5570415529295872 Signed-off-by: Jimmy Shiu <jimmyshiu@google.com> Change-Id: Ibeb766957d2dea5fee85c798d8a9f7b62c2c1a09	2020-05-30 02:29:34 +08:00
Woody Lin	b450d8c9b1	kdebuginfo: Interface to set buildinfo Bug: 155246473 Change-Id: I2d6efccab9c8b0ee8a8f6c0d069205403d890296 Signed-off-by: Woody Lin <woodylin@google.com>	2020-05-30 02:29:09 +08:00
lucaswei	5b02fceb61	sched/fair: Fix compilation issues for !CONFIG_SCHED_WALT For compilation issues for !CONFIG_SCHED_WALT of the following two commits: commit `a80cf2007d` ("sched: Add support to spread tasks") Bug: 154086870 Bug: 153823050 Signed-off-by: lucaswei <lucaswei@google.com> Change-Id: I89e224e18f6700ea2abcd162a5b9f3f938a7ad92	2020-05-30 02:28:22 +08:00
lucaswei	95ddbb8a09	Merge LA.UM.9.12.R1.10.00.00.597.042 via branch 'qcom-msm-4.19-7250' into android-msm-pixel-4.19 Conflicts: Documentation/ABI/testing/sysfs-fs-f2fs Documentation/filesystems/f2fs.txt Documentation/filesystems/fscrypt.rst Documentation/filesystems/fsverity.rst Makefile arch/arm64/configs/vendor/kona_defconfig arch/arm64/configs/vendor/lito_defconfig block/blk-core.c build.config.common drivers/base/core.c drivers/base/power/main.c drivers/clk/clk.c drivers/clk/qcom/clk-alpha-pll.c drivers/dma-buf/dma-buf.c drivers/gpu/msm/kgsl_pool.c drivers/input/misc/qpnp-power-on.c drivers/iommu/dma-mapping-fast.c drivers/iommu/io-pgtable-fast.c drivers/iommu/io-pgtable-msm-secure.c drivers/iommu/io-pgtable.c drivers/of/property.c drivers/platform/msm/ipa/ipa_clients/ipa_gsb.c drivers/platform/msm/ipa/ipa_clients/ipa_mhi_client.c drivers/platform/msm/ipa/ipa_v3/ipa_mpm.c drivers/power/supply/power_supply_sysfs.c drivers/power/supply/qcom/qg-core.h drivers/power/supply/qcom/qpnp-qg.c drivers/soc/qcom/scm.c drivers/staging/android/ion/ion_page_pool.c drivers/tty/serial/msm_geni_serial.c fs/crypto/Kconfig fs/crypto/bio.c fs/crypto/crypto.c fs/crypto/fname.c fs/crypto/fscrypt_private.h fs/crypto/hooks.c fs/crypto/keyinfo.c fs/crypto/policy.c fs/ext4/ext4.h fs/ext4/hash.c fs/ext4/inode.c fs/ext4/namei.c fs/ext4/page-io.c fs/ext4/readpage.c fs/ext4/super.c fs/ext4/verity.c fs/f2fs/Makefile fs/f2fs/data.c fs/f2fs/dir.c fs/f2fs/f2fs.h fs/f2fs/file.c fs/f2fs/gc.c fs/f2fs/hash.c fs/f2fs/inline.c fs/f2fs/namei.c fs/f2fs/segment.c fs/f2fs/super.c fs/f2fs/sysfs.c fs/f2fs/verity.c fs/inode.c fs/ubifs/dir.c fs/unicode/utf8-core.c fs/verity/enable.c fs/verity/fsverity_private.h fs/verity/hash_algs.c fs/verity/open.c fs/verity/verify.c include/linux/coresight.h include/linux/device.h include/linux/dma-buf.h include/linux/f2fs_fs.h include/linux/fscrypt.h include/linux/fsverity.h include/linux/fwnode.h include/linux/leds-qpnp-flash.h include/linux/perf_event.h include/linux/power_supply.h include/linux/unicode.h include/soc/qcom/scm.h include/uapi/linux/nl80211.h kernel/events/core.c kernel/sched/core.c kernel/sched/fair.c lib/Kconfig.debug lib/Makefile lib/test_meminit.c mm/slub.c mm/swapfile.c mm/vmalloc.c net/wireless/nl80211.c security/selinux/include/security.h Bug: 153823050 Bug: 153825378 Signed-off-by: lucaswei <lucaswei@google.com> Change-Id: Ia2bfb56f0d48504ba600b52bdde958a76d5bff72	2020-05-30 02:28:19 +08:00
Eric W. Biederman	4a03fb3835	ANDROID: signal: Extend exec_id to 64bits commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream. Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Bug: 154513111 Test: boot bramble, verify list of probed devices Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `a2a1be2de7`) Signed-off-by: Will McVicker <willmcvicker@google.com> Change-Id: I55f74f593ea58a97c8bdea0769ebc93083a8f30d	2020-05-30 02:22:13 +08:00
Mark Salyzyn	b2ff09fbe0	GKI: devfreq: move trace definitions to the driver move bw_hwmon and memlat governor traces from global kernel definitions, to local module. Removes the need to maintain ABI in other kernels. Test: insmod governor_bw_hwmon.ko insmod governor_memlat.ko mkdir /tmp/t mount -t tracefs tracefs /tmp/t find /tmp/t/events \| grep 'power/bw_hwmon' find /tmp/t/events \| grep 'power/memlat' Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 142948174 Bug: 142905293 Change-Id: I98bba1d43cdaede74d9a631416288e2b8d6da9b3	2020-05-30 02:22:13 +08:00
Minchan Kim	6dd6325deb	mm: introduce per-process mm event tracking feature Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric. Android uses them too. If user see something goes wrong(e.g., sluggish, jank) on their system, they can capture and report system state to developers for debugging. It shows memory stat at the moment the bug is captured. However, it’s not enough to investigate application's jank problem caused by memory shortage. Because 1. It just shows event count which doesn’t quantify the latency of the application well. Jank could happen by various reasons and one of simple scenario is frame drop for a second. App should draw the frame every 16ms interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't represnt how many of time the app spends for handling the event. 2. At bugreport, dump with vmstat and meminfo is never helpful because it's too late to capture the moment when the problem happens. When the user catch up the problem and try to capture the system state, the problem has already gone. 3. Although we could capture MM stat at the moment bug happens, it couldn't be helpful because MM stats are usually very flucuate so we need historical data rather than one-time snapshot to see MM trend. To solve above problems, this patch introduces per-process, light-weight, mm event stat. Basically, it tracks minor/major faults, reclaim and compaction latency of each process as well as event count and record the data into global buffer. To compromise memory overhead, it doesn't record every MM event of the process to the buffer but just drain accumuated stats every 0.5sec interval to buffer. If there isn't any event, it just skips the recording. For latency data, it keeps average/max latency of each event in that period With that, we could keep useful information with small buffer so that we couldn't miss precious information any longer although the capture time is rather late. This patch introduces basic facility of MM event stat. After all patches in this patchset are applied, outout format is as follows, dumpstate can use it for VM debugging in future. <...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58 <...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994 <...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0 <...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0 <...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20 <...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0 This feature uses event trace for output buffer so that we could use all of general benefit of event trace(e.g., buffer size management, filtering and so on). To prevent overflow of the ring buffer by other random event race, highly suggest that create separate instance of tracing on /sys/kernel/debug/tracing/instances/ I had a concern of adding overhead. Actually, major\|compaction/reclaim are already heavy cost so it should be not a concern. Rather than, minor fault and kern alloc would be severe so I tested a micro benchmark to measure minor page fault overhead. Test scenario is create 40 threads and each of them does minor page fault for 25M range(ranges are not overwrapped). I didn't see any noticible regression. Base: fault/wsec avg: 758489.8288 minor faults=13123118, major faults=0 ctx switch=139234 User System Wall fault/wsec 39.55s 41.73s 17.49s 749995.768 minor faults=13123135, major faults=0 ctx switch=139627 User System Wall fault/wsec 34.59s 41.61s 16.95s 773906.976 minor faults=13123061, major faults=0 ctx switch=139254 User System Wall fault/wsec 39.03s 41.55s 16.97s 772966.334 minor faults=13123131, major faults=0 ctx switch=139970 User System Wall fault/wsec 36.71s 42.12s 17.04s 769941.019 minor faults=13123027, major faults=0 ctx switch=138524 User System Wall fault/wsec 42.08s 42.24s 18.08s 725639.047 Base + MM event + event trace enable: fault/wsec avg: 759626.1488 minor faults=13123488, major faults=0 ctx switch=140303 User System Wall fault/wsec 37.66s 42.21s 17.48s 750414.257 minor faults=13123066, major faults=0 ctx switch=138119 User System Wall fault/wsec 36.77s 42.14s 17.49s 750010.107 minor faults=13123505, major faults=0 ctx switch=140021 User System Wall fault/wsec 38.51s 42.50s 17.54s 748022.219 minor faults=13123431, major faults=0 ctx switch=138517 User System Wall fault/wsec 36.74s 41.49s 17.03s 770255.610 minor faults=13122955, major faults=0 ctx switch=137174 User System Wall fault/wsec 40.68s 40.97s 16.83s 779428.551 Bug: 80168800 Bug: 116825053 Bug: 153442668 Test: boot Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8 Signed-off-by: Minchan Kim <minchan@google.com> (cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:21:52 +08:00
Greg Kroah-Hartman	05951af8e6	UPSTREAM: bpf: Explicitly memset some bpf info structures declared on the stack Trying to initialize a structure with "= {};" will not always clean out all padding locations in a structure. So be explicit and call memset to initialize everything for a number of bpf information structures that are then copied from userspace, sometimes from smaller memory locations than the size of the structure. Bug: 153418162 Test: run vts VtsKernelNetTest pass (b/153418162#comment5) Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com (cherry picked from commit 269efb7fc478563a7e7b22590d8076823f4ac82a) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I52a2cab20aa310085ec104bd811ac4f2b83657b6 Signed-off-by: Mars Lin <marslin@google.com>	2020-05-30 02:21:36 +08:00
Greg Kroah-Hartman	cb05993e6a	UPSTREAM: bpf: Explicitly memset the bpf_attr structure For the bpf syscall, we are relying on the compiler to properly zero out the bpf_attr union that we copy userspace data into. Unfortunately that doesn't always work properly, padding and other oddities might not be correctly zeroed, and in some tests odd things have been found when the stack is pre-initialized to other values. Fix this by explicitly memsetting the structure to 0 before using it. Bug: 153418162 Test: run vts VtsKernelNetTest pass (b/153418162#comment5) Reported-by: Maciej Żenczykowski <maze@google.com> Reported-by: John Stultz <john.stultz@linaro.org> Reported-by: Alexander Potapenko <glider@google.com> Reported-by: Alistair Delva <adelva@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://android-review.googlesource.com/c/kernel/common/+/1235490 Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com (cherry picked from commit 8096f229421f7b22433775e928d506f0342e5907) Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I2dc28cd45024da5cc6861ff4a9b25fae389cc6d8 Signed-off-by: Mars Lin <marslin@google.com>	2020-05-30 02:21:36 +08:00
Saravana Kannan	ae46a7d4e7	GKI: sched: Add back the root_domain.overutilized field This field is necessary to maintain ABI compatibility with ACK. Add it back, but leave it unused. Bug: 153905799 Change-Id: Ic9ef5640fa77c3aada023843658e7e4de3bada82 Signed-off-by: Saravana Kannan <saravanak@google.com>	2020-05-30 02:21:26 +08:00
Saravana Kannan	e8a84bbd89	GKI: sched: Compile out push_task field in struct rq The push_task field is a WALT related field that shouldn't be needed since we run PELT. So conditionally compile in the field only when WALT is enabled. Also add #ifdefs around all the uses of this field. Bug: 153905799 Change-Id: I12edd3f2180ebab14719ba2548e83519beffacc2 Signed-off-by: Saravana Kannan <saravanak@google.com>	2020-05-30 02:21:26 +08:00
Martin Liu	2f52110115	Revert "mm: reclaim small amounts of memory when an external fragmentation event occurs" This reverts commit `68809fdd57`. also fix BB from 4165090057 Reason for revert: roll back to stable kernel Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I61b51972eab01f328ce375111a3bd04670de670b	2020-05-30 02:21:22 +08:00
Martin Liu	afb52653fc	Revert "mm: oom-kill: Add lmk_kill possible for ULMK" This reverts commit `aa9e75a9ff`. Reason for revert: remove customized code Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I364b45e09f22c59f82fdd768c0a5ec86d69fee9c	2020-05-30 02:20:58 +08:00
Martin Liu	ff1201cd90	Revert "mm: introduce INIT_VMA()" This reverts commit `ead04c98fd`. Reason for revert: remove SPF non upstream code Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I18ebe5d38d1ffb7a5a599b5be93eab71e0a5804f	2020-05-30 02:20:38 +08:00
Martin Liu	3e5f49e2cc	Revert "mm: protect mm_rb tree with a rwlock" This reverts commit `3f31f748a8`. Reason for revert: remove SPF non upstream code Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: Ic6efff6d069d20badad9af11beec0dbe36c659f5	2020-05-30 02:20:37 +08:00
Martin Liu	18a8850202	Revert "mm: protect against PTE changes done by dup_mmap()" This reverts commit `0c8a35f8dd`. Reason for revert: remove SPF non upstream code Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I912a8891ac6cf3e72c7b7aa27df2922554b31491	2020-05-30 02:20:30 +08:00
Martin Liu	ef36c60fe7	mm: Revert previous mm revert list This commit reverts e799c1b10c54...cfb042c6c5d1 Reason for revert: unblock GKI Bug: 140544941 Test: boot Change-Id: I4ebe6c01918788cdc2468ceabf101ef7c3e3c452 Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:20:27 +08:00
Minchan Kim	355b8cff31	Revert "mm: introduce INIT_VMA()" This reverts commit `ead04c98fd`. Reason for revert: revet customized code Bug: 140544941 Test: boot Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I05aace3dfeb65fdb47f650e5b93dccc72f2edee3	2020-05-30 02:20:20 +08:00
Martin Liu	029f472f91	Revert "mm: protect mm_rb tree with a rwlock" This reverts commit `3f31f748a8`. Reason for revert: revet customized code Bug: 140544941 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I1b72d4595a89fc512ff22e49e61e3b8dfa47ede8	2020-05-30 02:20:19 +08:00
Minchan Kim	96f9319be9	Revert "mm: protect against PTE changes done by dup_mmap()" This reverts commit `0c8a35f8dd`. Reason for revert: revet customized code Bug: 140544941 Test: boot Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I3cbd23ebf4fb0bd92009d05f772f48d8f46e48f0	2020-05-30 02:20:13 +08:00
Minchan Kim	c48c177651	Revert "mm: reclaim small amounts of memory when an external fragmentation event occurs" This reverts commit `68809fdd57`. also fix BB from porting 416509005 Reason for revert: revet customized code Bug: 140544941 Test: boot Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I65735f27f6a44a112957bcec07e2f63f2d8ccff6	2020-05-30 02:20:04 +08:00
Minchan Kim	4a7f0b329a	Revert "mm: oom-kill: Add lmk_kill possible for ULMK" This reverts commit `aa9e75a9ff`. Reason for revert: revet customized code Bug: 140544941 Test: boot Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: I1475e099c72dcdd33fc2497dd30aa51d07bfa73d	2020-05-30 02:19:43 +08:00
Eric Biggers	f6ab3f8004	FROMLIST: kmod: make request_module() return an error when autoloading is disabled It's long been possible to disable kernel module autoloading completely (while still allowing manual module insertion) by setting /proc/sys/kernel/modprobe to the empty string. This can be preferable to setting it to a nonexistent file since it avoids the overhead of an attempted execve(), avoids potential deadlocks, and avoids the call to security_kernel_module_request() and thus on SELinux-based systems eliminates the need to write SELinux rules to dontaudit module_request. However, when module autoloading is disabled in this way, request_module() returns 0. This is broken because callers expect 0 to mean that the module was successfully loaded. Apparently this was never noticed because this method of disabling module autoloading isn't used much, and also most callers don't use the return value of request_module() since it's always necessary to check whether the module registered its functionality or not anyway. But improperly returning 0 can indeed confuse a few callers, for example get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit: if (!fs && (request_module("fs-%.s", len, name) == 0)) { fs = __get_fs_type(name, len); WARN_ONCE(!fs, "request_module fs-%.s succeeded, but still no fs?\n", len, name); } This is easily reproduced with: echo > /proc/sys/kernel/modprobe mount -t NONEXISTENT none / It causes: request_module fs-NONEXISTENT succeeded, but still no fs? WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0 [...] This should actually use pr_warn_once() rather than WARN_ONCE(), since it's also user-reachable if userspace immediately unloads the module. Regardless, request_module() should correctly return an error when it fails. So let's make it return -ENOENT, which matches the error when the modprobe binary doesn't exist. I've also sent patches to document and test this case. Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Jessica Yu <jeyu@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: NeilBrown <neilb@suse.com> Link: https://lore.kernel.org/r/20200318230515.171692-2-ebiggers@kernel.org Bug: 151690015 Change-Id: I5e04f85e12a4f85da23e53bc11da1ade565abcd6 Signed-off-by: Eric Biggers <ebiggers@google.com>	2020-05-30 02:19:27 +08:00
Suren Baghdasaryan	5b3cf9b841	UPSTREAM: sched/psi: Fix OOB write when writing 0 bytes to PSI files Issuing write() with count parameter set to 0 on any file under /proc/pressure/ will cause an OOB write because of the access to buf[buf_size-1] when NUL-termination is performed. Fix this by checking for buf_size to be non-zero. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20200203212216.7076-1-surenb@google.com Bug: 152499875 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9ec7acfc6e1083c677a95b0ea1c559ab50152873 (cherry picked from commit `67e4408599`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:24 +08:00
Johannes Weiner	48711f4d27	UPSTREAM: psi: Fix a division error in psi poll() The psi window size is a u64 an can be up to 10 seconds right now, which exceeds the lower 32 bits of the variable. We currently use div_u64 for it, which is meant only for 32-bit divisors. The result is garbage pressure sampling values and even potential div0 crashes. Use div64_u64. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Jingfeng Xie <xiejingfeng@linux.alibaba.com> Link: https://lkml.kernel.org/r/20191203183524.41378-3-hannes@cmpxchg.org Signed-off-by: Sasha Levin <sashal@kernel.org> Bug: 152499875 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I49fdfd55751d1a2cde19666624c9c5d76dc78dad (cherry picked from commit `cf46cf40bc`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:23 +08:00
Johannes Weiner	7064fd39b3	UPSTREAM: sched/psi: Fix sampling error and rare div0 crashes with cgroups and high uptime Jingfeng reports rare div0 crashes in psi on systems with some uptime: [58914.066423] divide error: 0000 [#1] SMP [58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O) [58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 #1 [58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019 [58914.102728] Workqueue: events psi_update_work [58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000 [58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330 [58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246 [58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640 [58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e [58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000 [58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000 [58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78 [58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000 [58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0 [58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [58914.200360] PKRU: 55555554 [58914.203221] Stack: [58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8 [58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000 [58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [58914.228734] Call Trace: [58914.231337] [] psi_update_work+0x22/0x60 [58914.237067] [] process_one_work+0x189/0x420 [58914.243063] [] worker_thread+0x4e/0x4b0 [58914.248701] [] ? process_one_work+0x420/0x420 [58914.254869] [] kthread+0xe6/0x100 [58914.259994] [] ? kthread_park+0x60/0x60 [58914.265640] [] ret_from_fork+0x39/0x50 [58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1 [58914.279691] RIP [] psi_update_stats+0x1c1/0x330 The crashing instruction is trying to divide the observed stall time by the sampling period. The period, stored in R8, is not 0, but we are dividing by the lower 32 bits only, which are all 0 in this instance. We could switch to a 64-bit division, but the period shouldn't be that big in the first place. It's the time between the last update and the next scheduled one, and so should always be around 2s and comfortably fit into 32 bits. The bug is in the initialization of new cgroups: we schedule the first sampling event in a cgroup as an offset of sched_clock(), but fail to initialize the last_update timestamp, and it defaults to 0. That results in a bogusly large sampling period the first time we run the sampling code, and consequently we underreport pressure for the first 2s of a cgroup's life. But worse, if sched_clock() is sufficiently advanced on the system, and the user gets unlucky, the period's lower 32 bits can all be 0 and the sampling division will crash. Fix this by initializing the last update timestamp to the creation time of the cgroup, thus correctly marking the start of the first pressure sampling period in a new cgroup. Reported-by: Jingfeng Xie <xiejingfeng@linux.alibaba.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Link: https://lkml.kernel.org/r/20191203183524.41378-2-hannes@cmpxchg.org Signed-off-by: Sasha Levin <sashal@kernel.org> Bug: 152499875 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Iaada5c2f1a03cf38cbb053adde478f762ce40843 (cherry picked from commit `55013802e8`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:23 +08:00
Miles Chen	3b96c1807d	UPSTREAM: sched/psi: Correct overly pessimistic size calculation When passing a equal or more then 32 bytes long string to psi_write(), psi_write() copies 31 bytes to its buf and overwrites buf[30] with '\0'. Which makes the input string 1 byte shorter than it should be. Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf). This does not cause problems in normal use case like: "some 500000 10000000" or "full 500000 10000000" because they are less than 32 bytes in length. /* assuming nbytes == 35 / char buf[32]; buf_size = min(nbytes, (sizeof(buf) - 1)); / buf_size = 31 / if (copy_from_user(buf, user_buf, buf_size)) return -EFAULT; buf[buf_size - 1] = '\0'; / buf[30] = '\0' */ Before: %cd /proc/pressure/ %echo "123456789\|123456789\|123456789\|1234" > memory [ 22.473497] nbytes=35,buf_size=31 [ 22.473775] 123456789\|123456789\|123456789\| (print 30 chars) %sh: write error: Invalid argument %echo "123456789\|123456789\|123456789\|1" > memory [ 64.916162] nbytes=32,buf_size=31 [ 64.916331] 123456789\|123456789\|123456789\| (print 30 chars) %sh: write error: Invalid argument After: %cd /proc/pressure/ %echo "123456789\|123456789\|123456789\|1234" > memory [ 254.837863] nbytes=35,buf_size=32 [ 254.838541] 123456789\|123456789\|123456789\|1 (print 31 chars) %sh: write error: Invalid argument %echo "123456789\|123456789\|123456789\|1" > memory [ 9965.714935] nbytes=32,buf_size=32 [ 9965.715096] 123456789\|123456789\|123456789\|1 (print 31 chars) %sh: write error: Invalid argument Also remove the superfluous parentheses. Signed-off-by: Miles Chen <miles.chen@mediatek.com> Cc: <linux-mediatek@lists.infradead.org> Cc: <wsd_upstream@mediatek.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190912103452.13281-1-miles.chen@mediatek.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Bug: 152499875 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9371b4d5e465bb8b84ff7adf5f40f30696c6ff70 (cherry picked from commit `88a47f1659`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:22 +08:00
Jason Xing	4694dbe19e	UPSTREAM: psi: get poll_work to run when calling poll syscall next time Only when calling the poll syscall the first time can user receive POLLPRI correctly. After that, user always fails to acquire the event signal. Reproduce case: 1. Get the monitor code in Documentation/accounting/psi.txt 2. Run it, and wait for the event triggered. 3. Kill and restart the process. The question is why we can end up with poll_scheduled = 1 but the work not running (which would reset it to 0). And the answer is because the scheduling side sees group->poll_kworker under RCU protection and then schedules it, but here we cancel the work and destroy the worker. The cancel needs to pair with resetting the poll_scheduled flag. Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 152499875 Test: lmkd_unit_test Change-Id: Ieaa8284ef632ef06318a92d792b239d344bb29d1 Signed-off-by: Suren Baghdasaryan <surenb@google.com> (cherry picked from commit `e71f9c35ee`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:22 +08:00
Peter Zijlstra	7cec2c6125	UPSTREAM: sched/psi: Reduce psimon FIFO priority PSI defaults to a FIFO-99 thread, reduce this to FIFO-1. FIFO-99 is the very highest priority available to SCHED_FIFO and it not a suitable default; it would indicate the psi work is the most important work on the machine. Since Real-Time tasks will have pre-allocated memory and locked it in place, Real-Time tasks do not care about PSI. All it needs is to be above OTHER. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Bug: 152499875 Test: lmkd_unit_test Change-Id: I52964915467577bfc3543700aec9b463f6f0ffe1 (cherry picked from commit `2a220bc9f2`) Signed-off-by: Martin Liu <liumartin@google.com>	2020-05-30 02:19:22 +08:00
Aneesh Kumar K.V	f98bddaf3c	BACKPORT: GKI: mm/memunmap: don't access uninitialized memmap in memunmap_pages() Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6. This series fixes the access of uninitialized memmaps when shrinking zones/nodes and when removing memory. Also, it contains all fixes for crashes that can be triggered when removing certain namespace using memunmap_pages() - ZONE_DEVICE, reported by Aneesh. We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be more involved (we don't have SECTION_IS_ONLINE as an indicator), and shrinking is only of limited use (set_zone_contiguous() cannot detect the ZONE_DEVICE as contiguous). We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of code to a minimum. Shrinking is especially necessary to keep zone->contiguous set where possible, especially, on memory unplug of DIMMs at zone boundaries. -------------------------------------------------------------------------- Zones are now properly shrunk when offlining memory blocks or when onlining failed. This allows to properly shrink zones on memory unplug even if the separate memory blocks of a DIMM were onlined to different zones or re-onlined to a different zone after offlining. Example: :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 :/# echo "online_movable" > /sys/devices/system/memory/memory41/state :/# echo "online_movable" > /sys/devices/system/memory/memory43/state :/# cat /proc/zoneinfo Node 1, zone Movable spanned 98304 present 65536 managed 65536 :/# echo 0 > /sys/devices/system/memory/memory43/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 32768 present 32768 managed 32768 :/# echo 0 > /sys/devices/system/memory/memory41/online :/# cat /proc/zoneinfo Node 1, zone Movable spanned 0 present 0 managed 0 This patch (of 10): With an altmap, the memmap falling into the reserved altmap space are not initialized and, therefore, contain a garbage NID and a garbage zone. Make sure to read the NID/zone from a memmap that was initialized. This fixes a kernel crash that is observed when destroying a namespace: kernel BUG at include/linux/mm.h:1107! cpu 0x1: Vector: 700 (Program Check) at [c000000274087890] pc: c0000000004b9728: memunmap_pages+0x238/0x340 lr: c0000000004b9724: memunmap_pages+0x234/0x340 ... pid = 3669, comm = ndctl kernel BUG at include/linux/mm.h:1107! devm_action_release+0x30/0x50 release_nodes+0x268/0x2d0 device_release_driver_internal+0x174/0x240 unbind_store+0x13c/0x190 drv_attr_store+0x44/0x60 sysfs_kf_write+0x70/0xa0 kernfs_fop_write+0x1ac/0x290 __vfs_write+0x3c/0x70 vfs_write+0xe4/0x200 ksys_write+0x7c/0x140 system_call+0x5c/0x68 The "page_zone(pfn_to_page(pfn)" was introduced by 69324b8f4833 ("mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support"), however, I think we will never have driver reserved memory with MEMORY_DEVICE_PRIVATE (no altmap AFAIKS). [david@redhat.com: minimze code changes, rephrase description] Link: http://lkml.kernel.org/r/20191006085646.5768-2-david@redhat.com Fixes: 2c2a5af6fed2 ("mm, memory_hotplug: add nid parameter to arch_remove_memory") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Damian Tometzki <damian.tometzki@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pankaj Gupta <pagupta@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qian Cai <cai@lca.pw> Cc: Rich Felker <dalias@libc.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 77e080e7680e1e615587352f70c87b9e98126d03) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 Change-Id: Ib3b09ddcbc8a42df0d596e6549fd3a40e6b998b1	2020-05-30 02:19:10 +08:00
Oscar Salvador	f88adb8eab	BACKPORT: GKI: mm, memory_hotplug: add nid parameter to arch_remove_memory -- snip -- Missing unification of mm/hmm.c and kernel/memremap.c -- snip -- Patch series "Do not touch pages in hot-remove path", v2. This patchset aims for two things: 1) A better definition about offline and hot-remove stage 2) Solving bugs where we can access non-initialized pages during hot-remove operations [2] [3]. This is achieved by moving all page/zone handling to the offline stage, so we do not need to access pages when hot-removing memory. [1] https://patchwork.kernel.org/cover/10691415/ [2] https://patchwork.kernel.org/patch/10547445/ [3] https://www.spinics.net/lists/linux-mm/msg161316.html This patch (of 5): This is a preparation for the following-up patches. The idea of passing the nid is that it will allow us to get rid of the zone parameter afterwards. Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 2c2a5af6fed20cf74401c9d64319c76c5ff81309) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 150378964 Change-Id: Ie66e53db21682a60d6eb8b269b6e0980a736c573	2020-05-30 02:19:09 +08:00
Wei Wang	9923768ca5	sched: restrict iowait boost to tasks with prefer_idle Currently iowait doesn't distinguish background/foreground tasks and we have seen cases where a device run to high frequency unnecessarily when running some background I/O. This patch limits iowait boost to tasks with prefer_idle only. Specifically, on Pixel, those are foreground and top app tasks. Bug: 130308826 Bug: 144961757 Test: Boot and trace Change-Id: I2d892beeb4b12b7e8f0fb2848c23982148648a10 Signed-off-by: Wei Wang <wvw@google.com>	2020-05-30 02:18:46 +08:00
Peter Zijlstra	6b46ef99ae	futex: Fix inode life-time issue commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream. As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Bug: 152809067 Test: compile, boot bramble, verify list of probed devices Reported-by: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit `e6d506cd22`) Signed-off-by: Will McVicker <willmcvicker@google.com> Change-Id: I237f35a875c07e5bda19df4fdf3318fae16e79d4	2020-05-30 02:18:43 +08:00
Robin Peng	58a95695aa	Merge LA.UM.9.12.R1.10.00.00.597.032 via branch 'qcom-msm-4.19-7250' into android-msm-pixel-4.19 Conflicts: arch/arm64/configs/vendor/kona_defconfig arch/arm64/configs/vendor/lito_defconfig arch/arm64/include/asm/traps.h drivers/power/supply/qcom/qpnp-smb5.c kernel/sched/sched.h Bug: 151568484 Change-Id: I6ed9ae8bc29d93e42b8527ae25074db334c640da Signed-off-by: Robin Peng <robinpeng@google.com>	2020-05-30 02:18:38 +08:00
Will McVicker	66a2602997	GKI: trace: ipc_logging: modularize IPC logging This change: * adds exports and converts ifdef -> if IS_ENABLED(...) * sets CONFIG_IPC_LOGGING as tristate * properly handles ipc_logging_debug.c for debugfs * stubs out ipc_log_* calls for built-in drivers Signed-off-by: Will McVicker <willmcvicker@google.com> Bug: 150231337 Test: compile and boot bramble Change-Id: I0379cadbaf1dee5d358144b1b4f2dc374635021d	2020-05-30 02:16:53 +08:00
Tri Vo	cc81490830	BACKPORT: PM / wakeup: Show wakeup sources stats in sysfs Add an ID and a device pointer to 'struct wakeup_source'. Use them to to expose wakeup sources statistics in sysfs under /sys/class/wakeup/wakeup<ID>/*. Bug: 129087298 Bug: 151789966 Co-developed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Tri Vo <trong@android.com> Tested-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit c8377adfa78103be5380200eb9dab764d7ca890e) Signed-off-by: Tri Vo <trong@google.com> (cherry picked from commit `2c9f5fa9c3`) [sspatil: fix conflict in fs/eventpoll.c] [sspatil: fix all in-tree usage of wakeup_source_register] Signed-off-by: Sandeep Patil <sspatil@google.com> Change-Id: Ie12200c8d439b08410961415d5899a390b82f5b0	2020-05-30 02:16:43 +08:00
Tri Vo	02ba99da90	UPSTREAM: PM / wakeup: Use wakeup_source_register() in wakelock.c kernel/power/wakelock.c duplicates wakeup source creation and registration code from drivers/base/power/wakeup.c. Change struct wakelock's wakeup source to a pointer and use wakeup_source_register() function to create and register said wakeup source. Use wakeup_source_unregister() on cleanup path. Signed-off-by: Tri Vo <trong@android.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 2434aea58e652a9fe114181ac90aa60e2f2e1b25) Bug: 129087298 Signed-off-by: Tri Vo <trong@google.com> Change-Id: I4e6b3c613c561fb382f17c3c31b6584aebabfb5d (cherry picked from commit `5bc2bdfb22`) Signed-off-by: Sandeep Patil <sspatil@google.com>	2020-05-30 02:16:39 +08:00
Suren Baghdasaryan	26ca615b1a	ANDROID: replace NR_INDIRECTLY_RECLAIMABLE_BYTES with NR_KERNEL_MISC_RECLAIMABLE Use NR_KERNEL_MISC_RECLAIMABLE instead of NR_INDIRECTLY_RECLAIMABLE_BYTES for kgsl allocations and in sysstats. Bug: 150808082 Test: build Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ice5167bd9b380bb4c4b4d810aa685d211bcf2f80	2020-05-30 02:16:20 +08:00
Minchan Kim	81a5020848	attribute page lock and waitqueue functions as sched trace_sched_blocked_trace in CFS is really useful for debugging via trace because it tell where the process was stuck on callstack. For example, <...>-6143 ( 6136) [005] d..2 50.278987: sched_blocked_reason: pid=6136 iowait=0 caller=SyS_mprotect+0x88/0x208 <...>-6136 ( 6136) [005] d..2 50.278990: sched_blocked_reason: pid=6142 iowait=0 caller=do_page_fault+0x1f4/0x3b0 <...>-6142 ( 6136) [006] d..2 50.278996: sched_blocked_reason: pid=6144 iowait=0 caller=SyS_prctl+0x52c/0xb58 <...>-6144 ( 6136) [006] d..2 50.279007: sched_blocked_reason: pid=6136 iowait=0 caller=vm_mmap_pgoff+0x74/0x104 However, sometime it gives pointless information like this. RenderThread-2322 ( 1805) [006] d.s3 50.319046: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220 logd.writer-594 ( 587) [002] d.s3 50.334011: sched_blocked_reason: pid=6126 iowait=1 caller=wait_on_page_bit+0x194/0x208 kworker/u16:13-333 ( 333) [007] d.s4 50.343161: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220 Such wait_on_page_bit, __lock_page_killable are pointless because it doesn't carry on higher information to identify the callstack. The reason is page_lock and waitqueue are special synchronization method unlike other normal locks(mutex, spinlock). Let's mark them as "__sched" so get_wchan which used in trace_sched_blocked_trace could detect it and skip them. It will produce more meaningful callstack function like this. <...>-2867 ( 1068) [002] d.h4 124.209701: sched_blocked_reason: pid=329 iowait=0 caller=worker_thread+0x378/0x470 <...>-2867 ( 1068) [002] d.s3 124.209763: sched_blocked_reason: pid=8454 iowait=1 caller=__filemap_fdatawait_range+0xa0/0x104 <...>-2867 ( 1068) [002] d.s4 124.209803: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470 ScreenDecoratio-2364 ( 1867) [002] d.s3 124.209973: sched_blocked_reason: pid=8454 iowait=1 caller=f2fs_wait_on_page_writeback+0x84/0xcc ScreenDecoratio-2364 ( 1867) [002] d.s4 124.209986: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470 <...>-329 ( 329) [000] d..3 124.210435: sched_blocked_reason: pid=538 iowait=0 caller=worker_thread+0x378/0x470 kworker/u16:13-538 ( 538) [007] d..3 124.210450: sched_blocked_reason: pid=6 iowait=0 caller=worker_thread+0x378/0x470 Bug: 144961676 Bug: 144713689 Change-Id: I30397400c5d056946bdfbc86c9ef5f4d7e6c98fe Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>	2020-05-30 02:16:09 +08:00
Wei Wang	fbabe265d9	trace: sched: add capacity change tracing Add a new tracepoint sched_capacity_update when capacity value updated. Bug: 144177658 Bug: 144961676 Test: Boot and grab trace to check Change-Id: I30ee55bfcc2fb5a92dd448ad364768ee428f3cc4 Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>	2020-05-30 02:16:09 +08:00
Wei Wang	6a370c94e7	kernel: sched: fix cpu cpu_capacity_orig being capped incorrectly update_cpu_capacity will update cpu_capacity_orig capped with thermal_cap, in non-WALT case, thermal_cap is previous cpu_capacity_orig. This caused cpu_capacity_orig being capped incorrectly. Test: Build Bug: 144143594 Bug: 144961676 Change-Id: I1ff9d9c87554c2d2395d46b215276b7ab50585c0 Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>	2020-05-30 02:16:09 +08:00

1 2 3 4 5 ...

30007 Commits