kernel_google_b1c1

Author	SHA1	Message	Date
Christian Hoffmann	bd1b170f1d	arch: arm64: vdso32: Drop -no-integrated-as Signed-off-by: Onelots <onelots@onelots.fr>	2024-12-24 00:25:57 +01:00
LuK1337	0a39bd9334	ARM64: vdso32: Hardcode toolchain target Fixes the following error when building with clang r530567: error: version 'kernel' in target triple 'arm-unknown-linux-androidkernel' is invalid Change-Id: I5a2d27bf0e8a22b2fe752c64efc0cc91c790b5f0	2024-12-23 23:32:25 +01:00
Chung-Hsien Hsu	895eaac708	nl80211: add WPA3 definition for SAE authentication Add definition of WPA version 3 for SAE authentication. Change-Id: I19ca34b8965168f011cc1352eba420f2d54b0258 Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com> Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2024-12-23 23:32:25 +01:00
Jonglin Lee	43e39d2a76	cpuidle: lpm_levels: Don't print parent clocks during suspend Calling clock_debug_print_enabled with print_parent = true during suspend may cause a scheduling while atomic violation. Call with print_parent = false instead to prevent the violation. Bug: 132511008 Change-Id: I80f646d77d0cc98b4004084022ce1dce0e80cc93 Signed-off-by: Jonglin Lee <jonglin@google.com> Signed-off-by: GeoPD <geoemmanuelpd2001@gmail.com>	2024-08-15 08:22:52 +05:30
Wei Wang	6400cd3b94	sched: restrict iowait boost to tasks with prefer_idle Currently iowait doesn't distinguish background/foreground tasks and we have seen cases where a device run to high frequency unnecessarily when running some background I/O. This patch limits iowait boost to tasks with prefer_idle only. Specifically, on Pixel, those are foreground and top app tasks. Bug: 130308826 Test: Boot and trace Change-Id: I2d892beeb4b12b7e8f0fb2848c23982148648a10 Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Lau <laststandrighthere@gmail.com>	2024-08-15 08:22:43 +05:30
Maria Yu	b5c22baa21	sched: core: Clear walt rq request in cpu starting Clear walt rq request in cpu starting. Change-Id: Id3004337f3924984b8b812151a6ba01c6f1c013e Signed-off-by: Maria Yu <aiquny@codeaurora.org> (cherry picked from commit 32df8f93e147dd54331161e9180d7ea488b750f9)	2024-08-15 08:22:18 +05:30
Pavankumar Kondeti	74a8607aa7	sched/walt: Fix the memory leak of idle task load pointers The memory for task load pointers are allocated twice for each idle thread except for the boot CPU. This happens during boot from idle_threads_init()->idle_init() in the following 2 paths. 1. idle_init()->fork_idle()->copy_process()-> sched_fork()->init_new_task_load() 2. idle_init()->fork_idle()-> init_idle()->init_new_task_load() The memory allocation for all tasks happens through the 1st path, so use the same for idle tasks and kill the 2nd path. Since the idle thread of boot CPU does not go through fork_idle(), allocate the memory for it separately. Change-Id: I4696a414ffe07d4114b56d326463026019e278f1 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> (cherry picked from commit eb58f47212c9621be82108de57bcf3e94ce1035a)	2024-08-15 07:11:04 +05:30
DhineshCool	c0dd3261ad	Revert "sched: Do not reduce perceived CPU capacity while idle" This reverts commit `20dfb57cb1`.	2024-08-15 06:33:57 +05:30
DhineshCool	f99e24746b	Revert "cpufreq: schedutil: Enforce realtime priority" This reverts commit `970b81bf75`.	2024-08-15 06:17:11 +05:30
DhineshCool	158bbf5f52	Revert "binder: Reserve caches for small, high-frequency memory allocations" This reverts commit `fab295b5c9`.	2024-08-14 19:17:42 +05:30
DhineshCool	2ca4eb547b	defconfig: cybertron-v10	2024-08-13 23:44:43 +05:30
ExactExampl	0d2f0ead2b	defconfig: b1c1: enable zram deduplication feature * Unset CONFIG_ZRAM_WRITEBACK while at it as writeback isn't being used	2024-08-13 23:41:21 +05:30
Juhyung Park	277f74bc1c	zram: switch to 64-bit hash for dedup The original dedup code does not handle collision from the observation that it practically does not happen. For additional peace of mind, use a bigger hash size for reducing the possibility of collision even further. Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:44 +05:30
Charan Teja Reddy	9864f4c40c	zram: fix race condition while returning zram_entry refcount With deduplication enabled, the duplicated zram objects are tracked using the zram_entry backed by a refcount. The race condition while decrementing the refcount through zram_dedup_put() is as follows: Say Task A and task B share the same object and thus the zram_entry->refcount = 2. Task A Task B zram_dedup_put zram_dedup_put spin_lock(&hash->lock); entry->refcount--; (Now it is 1) spin_unlock(&hash->lock); spin_lock(&hash->lock); entry->refcount--; (Now it is 0) spin_unlock(&hash->lock); return entry->refcount return entry->refcount We return 0 in above steps thus leading to double free of the handle, which is a slab object. Change-Id: I8dd9bad27140a6e3a295905bf4411050d8eac931 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:44 +05:30
Joonsoo Kim	222effc77a	zram: compare all the entries with same checksum for deduplication Until now, we compare just one entry with same checksum when checking duplication since it is the simplest way to implement. However, for the completeness, checking all the entries is better so this patch implement to compare all the entries with same checksum. Since this event would be rare so there would be no performance loss. Change-Id: Ie7d61c14d127a28f5a06d85b0ca66b9fada20cbb Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lore.kernel.org/patchwork/patch/787163/ Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:29 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:43 +05:30
Joonsoo Kim	0dfcc58d8d	zram: make deduplication feature optional Benefit of deduplication is dependent on the workload so it's not preferable to always enable. Therefore, make it optional in Kconfig and device param. Default is 'off'. This option will be beneficial for users who use the zram as blockdev and stores build output to it. Change-Id: If282bb8aa15c5749859a87cf36db7eb9edb3b1ed Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lore.kernel.org/patchwork/patch/787164/ Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:52 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:43 +05:30
Joonsoo Kim	3343551e25	zram: implement deduplication in zram This patch implements deduplication feature in zram. The purpose of this work is naturally to save amount of memory usage by zram. Android is one of the biggest users to use zram as swap and it's really important to save amount of memory usage. There is a paper that reports that duplication ratio of Android's memory content is rather high [1]. And, there is a similar work on zswap that also reports that experiments has shown that around 10-15% of pages stored in zswp are duplicates and deduplicate them provides some benefits [2]. Also, there is a different kind of workload that uses zram as blockdev and store build outputs into it to reduce wear-out problem of real blockdev. In this workload, deduplication hit is very high due to temporary files and intermediate object files. Detailed analysis is on the bottom of this description. Anyway, if we can detect duplicated content and avoid to store duplicated content at different memory space, we can save memory. This patch tries to do that. Implementation is almost simple and intuitive but I should note one thing about implementation detail. To check duplication, this patch uses checksum of the page and collision of this checksum could be possible. There would be many choices to handle this situation but this patch chooses to allow entry with duplicated checksum to be added to the hash, but, not to compare all entries with duplicated checksum when checking duplication. I guess that checksum collision is quite rare event and we don't need to pay any attention to such a case. Therefore, I decided the most simplest way to implement the feature. If there is a different opinion, I can accept and go that way. Following is the result of this patch. Test result #1 (Swap): Android Marshmallow, emulator, x86_64, Backporting to kernel v3.18 orig_data_size: 145297408 compr_data_size: 32408125 mem_used_total: 32276480 dup_data_size: 3188134 meta_data_size: 1444272 Last two metrics added to mm_stat are related to this work. First one, dup_data_size, is amount of saved memory by avoiding to store duplicated page. Later one, meta_data_size, is the amount of data structure to support deduplication. If dup > meta, we can judge that the patch improves memory usage. In Adnroid, we can save 5% of memory usage by this work. Test result #2 (Blockdev): build the kernel and store output to ext4 FS on zram <no-dedup> Elapsed time: 249 s mm_stat: 430845952 191014886 196898816 0 196898816 28320 0 0 0 <dedup> Elapsed time: 250 s mm_stat: 430505984 190971334 148365312 0 148365312 28404 0 47287038 3945792 There is no performance degradation and save 23% memory. Test result #3 (Blockdev): copy android build output dir(out/host) to ext4 FS on zram <no-dedup> Elapsed time: out/host: 88 s mm_stat: 8834420736 3658184579 3834208256 0 3834208256 32889 0 0 0 <dedup> Elapsed time: out/host: 100 s mm_stat: 8832929792 3657329322 2832015360 0 2832015360 32609 0 952568877 80880336 It shows performance degradation roughly 13% and save 24% memory. Maybe, it is due to overhead of calculating checksum and comparison. Test result #4 (Blockdev): copy android build output dir(out/target/common) to ext4 FS on zram <no-dedup> Elapsed time: out/host: 203 s mm_stat: 4041678848 2310355010 2346577920 0 2346582016 500 4 0 0 <dedup> Elapsed time: out/host: 201 s mm_stat: 4041666560 2310488276 1338150912 0 1338150912 476 0 989088794 24564336 Memory is saved by 42% and performance is the same. Even if there is overhead of calculating checksum and comparison, large hit ratio compensate it since hit leads to less compression attempt. I checked the detailed reason of savings on kernel build workload and there are some cases that deduplication happens. 1) *.cmd Build command is usually similar in one directory so content of these file are very similar. In my system, more than 789 lines in fs/ext4/.namei.o.cmd and fs/ext4/.inode.o.cmd are the same in 944 and 938 lines of the file, respectively. 2) intermediate object files built-in.o and temporary object file have the similar contents. More than 50% of fs/ext4/ext4.o is the same with fs/ext4/built-in.o. 3) vmlinux .tmp_vmlinux1 and .tmp_vmlinux2 and arch/x86/boo/compressed/vmlinux.bin have the similar contents. Android test has similar case that some of object files(.class and .so) are similar with another ones. (./host/linux-x86/lib/libartd.so and ./host/linux-x86-lib/libartd-comiler.so) Anyway, benefit seems to be largely dependent on the workload so following patch will make this feature optional. However, this feature can help some usecases so is deserved to be merged. [1]: MemScope: Analyzing Memory Duplication on Android Systems, dl.acm.org/citation.cfm?id=2797023 [2]: zswap: Optimize compressed pool memory utilization, lkml.kernel.org/r/1341407574.7551.1471584870761.JavaMail.weblogic@epwas3p2 Change-Id: I8fe80c956c33f88a6af337d50d9e210e5c35ce37 Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lore.kernel.org/patchwork/patch/787162/ Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:26 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:43 +05:30
Joonsoo Kim	28405fd009	zram: introduce zram_entry to prepare dedup functionality Following patch will implement deduplication functionality in the zram and it requires an indirection layer to manage the life cycle of zsmalloc handle. To prepare that, this patch introduces zram_entry which can be used to manage the life-cycle of zsmalloc handle. Many lines are changed due to rename but core change is just simple introduction about newly data structure. Change-Id: Ibf9912397c8c7dbcf1465550bc83a71f904e41c7 Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Link: https://lore.kernel.org/patchwork/patch/787161/ Patch-mainline: linux-kernel@ Thu, 11 May 2017 22:30:21 Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> Signed-off-by: Marco Zanin <mrczn.bb@gmail.com> Signed-off-by: snnbyyds <snnbyyds@gmail.com>	2024-08-13 23:40:43 +05:30
Tengfei Fan	ed6abdb80f	ANDROID: cpufreq: times: Have two spinlock in different cache line task_time_in_state_lock and uid_lock currently is very possiblly in same cache line that will cause livelock if 2 cores in contention. Change-Id: I644687c4d610af5e84a43f422a711d386d6d5181 Signed-off-by: Tengfei Fan <tengfeif@codeaurora.org> (cherry picked from commit bfea4ae591301043498a214cf6ef4e6250106316)	2024-08-13 23:40:43 +05:30
Maria Yu	d6631fffef	sched/fair: Consider others if target cpu overutilized If target cpu overutilized, it's better to consider other group cpu. It can avoid unnecessary waiting on overutilized cpu and wait until load balance for task to be run. Change-Id: I6f8bccb611d2f11471254cf2795fb5bf3f122292 Signed-off-by: Maria Yu <aiquny@codeaurora.org> (cherry picked from commit b9f8fdc34eeb61fcc7c770b6277a83fd30fc7d8e)	2024-08-13 23:40:43 +05:30
Chris Redpath	9314b62205	FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary When lower capacity CPUs are load balancing and considering to pull something from a higher capacity group, we should not pull tasks from a cpu with only one task running as this is guaranteed to impede progress for that task. If there is more than one task running, load balance in the higher capacity group would have already made any possible moves to resolve imbalance and we should make better use of system compute capacity by moving a task if we still have more than one running. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Change-Id: Ib86570abdd453a51be885b086c8d80be2773a6f2 Signed-off-by: Chris Redpath <chris.redpath@arm.com> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> [from https://lore.kernel.org/lkml/1530699470-29808-11-git-send-email-morten.rasmussen@arm.com/] Signed-off-by: Chris Redpath <chris.redpath@arm.com> Git-commit: 07e7ce6c8459defc34e63ae0f0334e811d223990 Git-repo: https://android.googlesource.com/kernel/common/ [clingutla@codeaurora.org: Resolved merge conflicts.] Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> (cherry picked from commit 779459e3fffda001181cfd6b1be2ffd3da25002c)	2024-08-13 23:40:43 +05:30
Joonwoo Park	ef01112e02	sched: ceil idle index to prevent from out of bound accessing It's possible size of given idle cost index is smaller than CPU's possible idle index size. Ceil the CPU's idle index to prevent out of bound accessing. Change-Id: Idecb4f68758dd0183886ea74d0e9da3d236b0062 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> (cherry picked from commit ecedc7afd841c8d7ef0145924620304608d269ef)	2024-08-13 23:40:42 +05:30
Joonwoo Park	12312cb361	sched: prevent out of bound access in sched_group_energy() group_idle_state() can return INT_MAX + 1 which is undefined behaviour when there is no CPUs in sched_group. Prevent such by error correctly. Change-Id: If9796c829c091e461231569dc38c5e5456f58037 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org> [clingutla@codeaurora.org: Fixed trivial merge conflicts and squashed msm-4.14 change] Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> (cherry picked from commit bb5b0e61527011e4ebfc4058713a9068da9e7492)	2024-08-13 23:40:42 +05:30
Maria Yu	57d6066272	cpufreq: schedutil: Queue sugov irq work on policy online cpu Got never update frequency if scheduled the irq work on an offlined cpu and it will always pending. Queue sugov irq work on any online cpu if current cpu is offline. Change-Id: I33fc691917b5866488b6aeb11ed902a2753130b2 Signed-off-by: Maria Yu <aiquny@codeaurora.org> (cherry picked from commit 1d2db9ab99a9abd0d9dcb320e6e0d266e21884f9)	2024-08-13 23:40:42 +05:30
Maria Yu	aa4a0a2807	sched/walt: Avoid walt irq work in offlined cpu Avoid walt irq work in offlined cpu. Change-Id: Ia4410562f66bfa57daa15d8c0a785a2c7a95f2a0 Signed-off-by: Maria Yu <aiquny@codeaurora.org> (cherry picked from commit 702cec976c863388c784eff37a71fa3ee8bb84d7)	2024-08-13 23:40:42 +05:30
Pavankumar Kondeti	37a5c34f00	Revert "sched: Remove sched_ktime_clock()" This reverts 'commit `24c18127e9` ("sched: Remove sched_ktime_clock()")' WALT accounting uses ktime_get() as time source to keep windows in align with the tick. ktime_get() API should not be called while the timekeeping subsystem is suspended during the system suspend. The code before the reverted patch has a wrapper around ktime_get() to avoid calling ktime_get() when timekeeping subsystem is suspended. The reverted patch removed this wrapper with the assumption that there will not be any scheduler activity while timekeeping subsystem is suspended. The timekeeping subsystem is resumed very early even before non-boot CPUs are brought online. However it is possible that tasks can wake up from the idle notifiers which gets called before timekeeping subsystem is resumed. When this happens, the time read from ktime_get() will not be consistent. We see a jump from the values that would be returned later when timekeeping subsystem is resumed. The rq->window_start update happens with incorrect time. This rq->window_start becomes inconsistent with the rest of the CPUs's rq->window_start and wallclock time after timekeeping subsystem is resumed. This results in WALT accounting bugs. Change-Id: I9c3b2fb9ffbf1103d1bd78778882450560dac09f Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> (cherry picked from commit faa04442e7a31357724dbb8e49ba64372ef37862)	2024-08-13 23:40:42 +05:30
Pavankumar Kondeti	e8e661152f	sched/fair: Fix redundant load balancer reattempt due to LBF_ALL_PINNED LBF_ALL_PINNED flag should cleared in can_migrate_task() if the task can run on the destination CPU during load balance. In current code, can_migrate_task() return incorrectly without clearing this flag in case if the task can't be migrated to the destination CPU due to cumulative window demand constraints. Since LBF_ALL_PINNED flag is not cleared, load balancer thinks that none of the tasks running on the busiest group can't be migrated to the destination CPU due to affinity settings and tries to find another busiest group. Prevent this incorrect reattempt of load balance by clearing LBF_ALL_PINNED flag right after the task affinity check in can_migrate_task(). Change-Id: Iad1cf42b1aaf70106ee5ecfbd9499ccb6eb7497e [clingutla@codeaurora.org: Resolved merge conflicts] Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> (cherry picked from commit 5ee367fc9386d4e36af644942d9d10f97827bab1)	2024-08-13 23:40:41 +05:30
Maria Yu	a01e3aaaff	sched/fair: Avoid unnecessary active load balance When find busiest group, it will avoid load balance if it is only 1 task running on src cpu. Consider race when different cpus do newly idle load balance at the same time, check src cpu nr_running to avoid unnecessary active load balance again. See the race condition example here: 1) cpu2 have 2 tasks, so cpu2 rq->nr_running == 2 and cfs.h_nr_running ==2. 2) cpu4 and cpu5 doing newly idle load balance at the same time. 3) cpu4 and cpu5 both see cpu2 sched_load_balance_sg_stats sum_nr_run=2 so they are both see cpu2 as the busiest rq. 4) cpu5 did a success migration task from cpu2, so cpu2 only have 1 task left, cpu2 rq->nr_running == 1 and cfs.h_nr_running ==1. 5) cpu4 surely goes to no_move because currently cpu4 only have 1 task which is currently running. 6) and then cpu4 goes here to check if cpu2 need active load balance. Change-Id: Ia9539a43e9769c4936f06ecfcc11864984c50c29 Signed-off-by: Maria Yu <aiquny@codeaurora.org> (cherry picked from commit fc61703628de002e2a5bf88e09933dbc3552d156)	2024-08-13 23:40:41 +05:30
Pavankumar Kondeti	9efe3c5438	sched/walt: Fix stale max_capacity issue during CPU hotplug Scheduler keeps track of the maximum capacity among all online CPUs in max_capacity. This is useful in checking if a given cluster/CPU is a max capacity CPU or not. The capacity of a CPU gets updated when its max frequency is limited by cpufreq and/or thermal. The CPUfreq limits notifications are received via CPUfreq policy notifier. However CPUfreq keeps the policy intact even when all of the CPUs governed by the policy are hotplugged out. So the CPUFREQ_REMOVE_POLICY notification never arrives and scheduler's notion of max_capacity becomes stale. The max_capacity may get corrected at some point later when CPUFREQ_NOTIFY notification comes for other online CPUs. But when the hotplugged CPUs comes online the max_capacity does not reflect since CPUFREQ_ADD_POLICY is not sent by the cpufreq. For example consider a system with 4 BIG and 4 little CPUs. Their original capacities are 2048 and 1024 respectively. The max_capacity points to 2048 when all CPUs are online. Now, 1. All 4 BIG CPUs are hotplugged out. Since there is no notification, the max_capacity still points to 2048, which is incorrect. 2. User clips the little CPUs's max_freq by 50%. CPUFREQ_NOTIFY arrives and max_capacity is updated by iterating all the online CPUs. At this point max_capacity becomes 512 which is correct. 3. User removes the above limits of little CPUs. The max_capacity becomes 1024 which is correct. 4. Now, BIG CPUs are hotplugged in. Since there is no notification, the max_capacity still points to 1024, which is incorrect. Fix this issue by wiring the max_capacity updates in WALT to scheduler hotplug callbacks. Ideally we want cpufreq domain hotplug callbacks but such notifiers are not present. So the max_capacity update is forced even when it is not necessary but that should not be a concern. Because CPU hotplug is supposed to be a rare event. The scheduler hotplug callbacks happen even before the hotplug CPU is removed from cpu_online_mask, so use cpu_active() check while evaluating the max_capacity. Since cpu_active_mask is a subset of cpu_online_mask, this is sufficient. Change-Id: I97b1974e2de1a9730285715858f1ada416d92a7a Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> (cherry picked from commit 3cd81b52aedf6802aaf7b41f3550b1850c7a09a4)	2024-08-13 23:40:41 +05:30
tip-bot for Jacob Shin	2bc84a0ac1	sched/fair: Force balancing on NOHZ balance if local group has capacity The "goto force_balance" here is intended to mitigate the fact that avg_load calculations can result in bad placement decisions when priority is asymmetrical. The original commit that adds it: `fab476228b` ("sched: Force balancing on newidle balance if local group has capacity") explains: Under certain situations, such as a niced down task (i.e. nice = -15) in the presence of nr_cpus NICE0 tasks, the niced task lands on a sched group and kicks away other tasks because of its large weight. This leads to sub-optimal utilization of the machine. Even though the sched group has capacity, it does not pull tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL. A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU capacity) systems - consider 8 always-running, same-priority tasks on a system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on the "big" CPUs (which will be represented by one sched_group in the DIE sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving one CPU unused. Because the "big" group has a higher group_capacity its avg_load may not present an imbalance that would cause migrating a task to the idle "little". The force_balance case here solves the problem but currently only for CPU_NEWLY_IDLE balances, which in theory might never happen on the unused CPU. Including CPU_IDLE in the force_balance case means there's an upper bound on the time before we can attempt to solve the underutilization: after DIE's sd->balance_interval has passed the next nohz balance kick will help us out. Change-Id: I6b0db178c0707603c8fd764fd3e44524c5345241 Signed-off-by: Brendan Jackman <brendan.jackman@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: 583ffd99d7657755736d831bbc182612d1d2697d Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> (cherry picked from commit 3d9aec71e139bce6d592b56afaa30f02c344e80e)	2024-08-13 23:40:41 +05:30
Lingutla Chandrasekhar	70e5add1e9	sched: energy: rebuild sched_domains with actual capacities While sched initialization, sched_domains might have built with default capacity values, and max_{min_}_cap_org_cpu's have updated based on them. After energy probe called, these capacities would change, but max_{min_}cap_org_cpu's still have old values. And using these staled cpus could give wrong start_cpu in finding energy efficient cpu. So rebuild sched_domains, which updates all cpu's group capacities with actual capacities and then build domains again, and update max_{min_}cap_org_cpus as well. Change-Id: I07d58bc849de363c5ed8fb743ab98d3fba727130 Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> (cherry picked from commit 5b2c99599d1dcf79ef7dec93c7935d6fc48869db)	2024-08-13 23:40:41 +05:30
Sultan Alsawaf	dd9658622e	msm: kgsl: Avoid dynamically allocating small command buffers Most command buffers here are rather small (fewer than 256 words); it's a waste of time to dynamically allocate memory for such a small buffer when it could easily fit on the stack. Conditionally using an on-stack command buffer when the size is small enough eliminates the need for using a dynamically-allocated buffer most of the time, reducing GPU command submission latency. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2024-08-13 23:40:30 +05:30
Sultan Alsawaf	388342b609	msm: kgsl: Don't allocate memory dynamically for temp command buffers The temporary command buffer in _set_pagetable_gpu is only the size of a single page, and _set_pagetable_gpu is never executed concurrently. It is therefore easy to replace the dynamic command buffer allocation with a static one to improve performance by avoiding the latency of dynamic memory allocation. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Danny Lin <danny@kdrag0n.dev>	2024-08-13 23:40:30 +05:30
Prakash Kamliya	a892b85bfb	msm: kgsl: Relax adreno spin idle tight loop Tight loop of adreno_spin_idle() causing RT throttling. Relax the tight loop by giving chance to other thread. Change-Id: Ic23d4551c0cc0b5f2fa7844ca73444d1412d480c Signed-off-by: Prakash Kamliya <pkamliya@codeaurora.org> Signed-off-by: Raphiel Rollerscaperers <rapherion@raphielgang.org>	2024-08-13 23:40:30 +05:30
Martin KaFai Lau	a6710190e0	bpf: Refactor codes handling percpu map Refactor the codes that populate the value of a htab_elem in a BPF_MAP_TYPE_PERCPU_HASH typed bpf_map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: kdrag0n <dragon@khronodragon.com>	2024-08-13 23:40:22 +05:30
Martin KaFai Lau	84360a36df	bpf: Add percpu LRU list Instead of having a common LRU list, this patch allows a percpu LRU list which can be selected by specifying a map attribute. The map attribute will be added in the later patch. While the common use case for LRU is #reads >> #updates, percpu LRU list allows bpf prog to absorb unusual #updates under pathological case (e.g. external traffic facing machine which could be under attack). Each percpu LRU is isolated from each other. The LRU nodes (including free nodes) cannot be moved across different LRU Lists. Here are the update performance comparison between common LRU list and percpu LRU list (the test code is at the last patch): [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i \| awk '{r += $3}END{printr " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: kdrag0n <dragon@khronodragon.com>	2024-08-13 23:40:22 +05:30
Martin KaFai Lau	d15c5c69e6	bpf: LRU List Introduce bpf_lru_list which will provide LRU capability to the bpf_htab in the later patch. * General Thoughts: 1. Target use case. Read is more often than update. (i.e. bpf_lookup_elem() is more often than bpf_update_elem()). If bpf_prog does a bpf_lookup_elem() first and then an in-place update, it still counts as a read operation to the LRU list concern. 2. It may be useful to think of it as a LRU cache 3. Optimize the read case 3.1 No lock in read case 3.2 The LRU maintenance is only done during bpf_update_elem() 4. If there is a percpu LRU list, it will lose the system-wise LRU property. A completely isolated percpu LRU list has the best performance but the memory utilization is not ideal considering the work load may be imbalance. 5. Hence, this patch starts the LRU implementation with a global LRU list with batched operations before accessing the global LRU list. As a LRU cache, #read >> #update/#insert operations, it will work well. 6. There is a local list (for each cpu) which is named 'struct bpf_lru_locallist'. This local list is not used to sort the LRU property. Instead, the local list is to batch enough operations before acquiring the lock of the global LRU list. More details on this later. 7. In the later patch, it allows a percpu LRU list by specifying a map-attribute for scalability reason and for use cases that need to prepare for the worst (and pathological) case like DoS attack. The percpu LRU list is completely isolated from each other and the LRU nodes (including free nodes) cannot be moved across the list. The following description is for the global LRU list but mostly applicable to the percpu LRU list also. * Global LRU List: 1. It has three sub-lists: active-list, inactive-list and free-list. 2. The two list idea, active and inactive, is borrowed from the page cache. 3. All nodes are pre-allocated and all sit at the free-list (of the global LRU list) at the beginning. The pre-allocation reasoning is similar to the existing BPF_MAP_TYPE_HASH. However, opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in the LRU map. * Active/Inactive List (of the global LRU list): 1. The active list, as its name says it, maintains the active set of the nodes. We can think of it as the working set or more frequently accessed nodes. The access frequency is approximated by a ref-bit. The ref-bit is set during the bpf_lookup_elem(). 2. The inactive list, as its name also says it, maintains a less active set of nodes. They are the candidates to be removed from the bpf_htab when we are running out of free nodes. 3. The ordering of these two lists is acting as a rough clock. The tail of the inactive list is the older nodes and should be released first if the bpf_htab needs free element. * Rotating the Active/Inactive List (of the global LRU list): 1. It is the basic operation to maintain the LRU property of the global list. 2. The active list is only rotated when the inactive list is running low. This idea is similar to the current page cache. Inactive running low is currently defined as "# of inactive < # of active". 3. The active list rotation always starts from the tail. It moves node without ref-bit set to the head of the inactive list. It moves node with ref-bit set back to the head of the active list and then clears its ref-bit. 4. The inactive rotation is pretty simply. It walks the inactive list and moves the nodes back to the head of active list if its ref-bit is set. The ref-bit is cleared after moving to the active list. If the node does not have ref-bit set, it just leave it as it is because it is already in the inactive list. * Shrinking the Inactive List (of the global LRU list): 1. Shrinking is the operation to get free nodes when the bpf_htab is full. 2. It usually only shrinks the inactive list to get free nodes. 3. During shrinking, it will walk the inactive list from the tail, delete the nodes without ref-bit set from bpf_htab. 4. If no free node found after step (3), it will forcefully get one node from the tail of inactive or active list. Forcefully is in the sense that it ignores the ref-bit. * Local List: 1. Each CPU has a 'struct bpf_lru_locallist'. The purpose is to batch enough operations before acquiring the lock of the global LRU. 2. A local list has two sub-lists, free-list and pending-list. 3. During bpf_update_elem(), it will try to get from the free-list of (the current CPU local list). 4. If the local free-list is empty, it will acquire from the global LRU list. The global LRU list can either satisfy it by its global free-list or by shrinking the global inactive list. Since we have acquired the global LRU list lock, it will try to get at most LOCAL_FREE_TARGET elements to the local free list. 5. When a new element is added to the bpf_htab, it will first sit at the pending-list (of the local list) first. The pending-list will be flushed to the global LRU list when it needs to acquire free nodes from the global list next time. * Lock Consideration: The LRU list has a lock (lru_lock). Each bucket of htab has a lock (buck_lock). If both locks need to be acquired together, the lock order is always lru_lock -> buck_lock and this only happens in the bpf_lru_list.c logic. In hashtab.c, both locks are not acquired together (i.e. one lock is always released first before acquiring another lock). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: kdrag0n <dragon@khronodragon.com>	2024-08-13 23:40:21 +05:30
Michal Hocko	d9af72efb8	bpf: do not use KMALLOC_SHIFT_MAX Commit `01b3f52157` ("bpf: fix allocation warnings in bpf maps and integer overflow") has added checks for the maximum allocateable size. It (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect it is not very clean because we already have KMALLOC_MAX_SIZE for this very reason so let's change both checks to use KMALLOC_MAX_SIZE instead. The original motivation for using KMALLOC_SHIFT_MAX was to work around an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER". Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: kdrag0n <dragon@khronodragon.com>	2024-08-13 23:40:21 +05:30
Yaroslav Furman	9ca25208f4	Revert "ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue" This reverts commit `e97c4ed917`. We don't need dm-crypt (FDE, which shouldn't ever be used on b4s4 anyway) to compete with touch, boosting, and similar important things. Signed-off-by: kdrag0n <dragon@khronodragon.com>	2024-08-13 23:40:19 +05:30
Tyler Nijmeh	20dfb57cb1	sched: Do not reduce perceived CPU capacity while idle CPUs that are idle are excellent candidates for latency sensitive or high-performance tasks. Decrementing their capacity while they are idle will result in these CPUs being chosen less, and they will prefer to schedule smaller tasks instead of large ones. Disable this. Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>	2024-08-13 23:36:15 +05:30
Tyler Nijmeh	f5daa9d7ec	sched: Enable NEXT_BUDDY for better cache locality By scheduling the last woken task first, we can increase cache locality since that task is likely to touch the same data as before. Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>	2024-08-13 23:36:15 +05:30
Tyler Nijmeh	970b81bf75	cpufreq: schedutil: Enforce realtime priority Even the interactive governor utilizes a realtime priority. It is beneficial for schedutil to process it's workload at a >= priority than mundane tasks (KGSL/AUDIO/ETC). Signed-off-by: Tyler Nijmeh <tylernij@gmail.com> Signed-off-by: clarencelol <clarencekuiek@icloud.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>	2024-08-13 23:36:14 +05:30
Sultan Alsawaf	26a793cb28	Revert "mutex: Add a delay into the SPIN_ON_OWNER wait loop." This reverts commit `c8de3f45ee`. This doesn't make sense for a few reasons. Firstly, upstream uses this mutex code and it works fine on all arches; why should arm be any different? Secondly, once the mutex owner starts to spin on `wait_lock`, preemption is disabled and the owner will be in an actively-running state. The optimistic mutex spinning occurs when the lock owner is actively running on a CPU, and while the optimistic spinning takes place, no attempt to acquire `wait_lock` is made by the new waiter. Therefore, it is guaranteed that new mutex waiters which optimistically spin will not contend the `wait_lock` spin lock that the owner needs to acquire in order to make forward progress. Another potential source of `wait_lock` contention can come from tasks that call mutex_trylock(), but this isn't actually problematic (and if it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This won't introduce significant contention on `wait_lock` because the trylock code exits before attempting to lock `wait_lock`, specifically when the atomic mutex counter indicates that the mutex is already locked. So in reality, the amount of `wait_lock` contention that can come from mutex_trylock() amounts to only one task. And once it finishes, `wait_lock` will no longer be contended and the previous mutex owner can proceed with clean up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Albert I <kras@raphielgang.org>	2024-08-13 23:36:11 +05:30
Yaroslav Furman	b463d71ac6	drm/sde: sde_fence: don't copy fence names This is in screen rendering path. Calling snprintf there is unwise. This also has an advantage of reducing size of struct sde_fence from 152b to 128b. Change-Id: I26f54537fc13a69a1f726d018a93bde5ef3477ac Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	2024-08-13 23:36:08 +05:30
Sultan Alsawaf	58ceb150b3	msm: kgsl: Use lock-less list for page pools Page pool additions and removals are very hot during GPU workloads, so they should be optimized accordingly. We can use a lock-less list for storing the free pages in order to speed things up. The lock-less list allows for one llist_del_first() user and unlimited llist_add() users to run concurrently, so only a spin lock around the llist_del_first() is needed; everything else is lock-free. The per-pool page count is now an atomic to make it lock-free as well. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> [jjpprrrr: adapted _kgsl_pool_get_page() because k4.9 does not update vmstat counter for memory held in pools] Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>	2024-08-13 23:36:07 +05:30
Sultan Alsawaf	bb2b4a801f	msm: kgsl: Don't try to wait for fences that have been signaled Trying to wait for fences that have already been signaled incurs a high setup cost, since dynamic memory allocation must be used. Avoiding this overhead when it isn't needed improves performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: palaych <palaychm@yandex.ru> Change-Id: Iea6f84553c4c3d053858021948b18f2421a4d26e	2024-08-13 23:36:07 +05:30
Sultan Alsawaf	ecafe799d0	msm: kgsl: Dispatch commands using a master kthread Instead of coordinating between a worker when dispatching commands and abusing a mutex lock for synchronization, it's faster to keep a single kthread dispatching commands whenever needed. This reduces GPU processing latency. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> [0ctobot: Adapted for msm-4.9, this reverts commit: `2eb74d7` ("msm: kgsl: Defer issue commands to worker thread")] Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Raphiel Rollerscaperers <raphielscape@outlook.com>	2024-08-13 23:36:07 +05:30
Sultan Alsawaf	1119fd06fe	defconfig: b1c1: Remove MSM event-timer The event timer driver is accessed directly from CPU idle and is not RT-friendly. Since the event timer is only used by the old MDSS driver, just remove it since it's unused on sdm670.	2024-08-13 23:35:12 +05:30
Sultan Alsawaf	90e9c6d036	defconfig: bonito: Remove MSM event-timer The event timer driver is accessed directly from CPU idle and is not RT-friendly. Since the event timer is only used by the old MDSS driver, just remove it since it's unused on sdm670.	2024-08-13 23:32:56 +05:30
Sultan Alsawaf	0e39b53ee6	PM / freezer: Reduce freeze timeout to 1 second for Android Freezing processes on Android usually takes less than 100 ms, and if it takes longer than that to the point where the 20 second freeze timeout is reached, it's because the remaining processes to be frozen are deadlocked waiting for something from a process which is already frozen. There's no point in burning power trying to freeze for that long, so reduce the freeze timeout to a very generous 1 second for Android and don't let anything mess with it. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com> Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>	2024-08-13 23:32:53 +05:30

1 2 3 4 5 ...

704995 Commits