Most command buffers here are rather small (fewer than 256 words); it's
a waste of time to dynamically allocate memory for such a small buffer
when it could easily fit on the stack.
Conditionally using an on-stack command buffer when the size is small
enough eliminates the need for using a dynamically-allocated buffer most
of the time, reducing GPU command submission latency.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
The temporary command buffer in _set_pagetable_gpu is only the size of a
single page, and _set_pagetable_gpu is never executed concurrently. It
is therefore easy to replace the dynamic command buffer allocation with
a static one to improve performance by avoiding the latency of dynamic
memory allocation.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
This is in screen rendering path. Calling snprintf there is unwise.
This also has an advantage of reducing size of struct sde_fence from 152b to 128b.
Change-Id: I26f54537fc13a69a1f726d018a93bde5ef3477ac
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
Page pool additions and removals are very hot during GPU workloads, so
they should be optimized accordingly. We can use a lock-less list for
storing the free pages in order to speed things up. The lock-less list
allows for one llist_del_first() user and unlimited llist_add() users to
run concurrently, so only a spin lock around the llist_del_first() is
needed; everything else is lock-free. The per-pool page count is now an
atomic to make it lock-free as well.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
[jjpprrrr: adapted _kgsl_pool_get_page() because k4.9 does not update
vmstat counter for memory held in pools]
Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Trying to wait for fences that have already been signaled incurs a high
setup cost, since dynamic memory allocation must be used. Avoiding this
overhead when it isn't needed improves performance.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: palaych <palaychm@yandex.ru>
Change-Id: Iea6f84553c4c3d053858021948b18f2421a4d26e
Instead of coordinating between a worker when dispatching commands and
abusing a mutex lock for synchronization, it's faster to keep a single
kthread dispatching commands whenever needed. This reduces GPU
processing latency.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
[0ctobot: Adapted for msm-4.9, this reverts commit:
2eb74d7 ("msm: kgsl: Defer issue commands to worker thread")]
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Raphiel Rollerscaperers <raphielscape@outlook.com>
POPP constantly attempts to lower the GPU's frequency behind the governor's
back in order to save power; however, the GPU governor in use
(msm-adreno-tz) is very good at determining the GPU's load and selecting
an appropriate frequency to run the GPU at.
POPP was created long ago (perhaps when msm-adreno-tz didn't exist or
didn't work so well), so it is clearly deprecated. Remove it.
Signed-off-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
(cherry picked from commit b5447fccd96348b0ee0db1d4a4476aeb9b9c0896)
(cherry picked from commit 9c7823e4243eea117c0f8f22e99558a67c8899b0)
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: Lau <laststandrighthere@gmail.com>
Signed-off-by: Joshua Primero <jprimero155@gmail.com>
Signed-off-by: Jprimero15 <jprimero155@gmail.com>
Currently, the kgsl worker thread is erroneosly ranked right below
Android's audio threads in terms of priority.
The kgsl worker thread is in the critical path for rendering frames to the
display, so increase its priority to match the priority of MDSS' kthread (mdss_fb0).
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
After scheduling the idle work for transitioning to either NAP or SLUMBER
it's possible that requested state can change to NONE if any new workload
comes before kgsl_idle_check is executed or it gets the device mutex.
If this happens, current code calls pwrctrl_change_state which fails as
this is not a legal transition. To prevent this, skip calling
pwrctrl_change_state if requested state is NONE.
Change-Id: Iae535c6e2a3230f9e7e210712990eeb405822f4f
Signed-off-by: Deepak Kumar <dkumar@codeaurora.org>
Updating qos remap updates requires reading registers to update values,
this adds additional CPU processing when in reality this update is only
needed once.
Bug: 142504774
Change-Id: Iec8d4dfd858b0602db7d2275b6b716dbcffe0d2f
Signed-off-by: Adrian Salido <salidoa@google.com>
The overhead of queuing the power-on worker every time there's an ioctl
received is significant due to the frequency of GPU ioctls. To mitigate
the high overhead, only fire up the power-on worker when the GPU isn't
active.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Waking the GPU upon touch wastes power when the screen is being touched
in a way that does not induce animation or any actual need for GPU usage.
Instead of preemptively waking the GPU on touch input, wake it up upon
receiving a IOCTL_KGSL_GPU_COMMAND ioctl since it is a sign that the GPU
will soon be needed.
Signed-off-by: Sultan Alsawaf <sultanxda@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
(cherry picked from commit 8e1e3f18f2459f83c9f249b90b6d28eead2bfe0c)
(cherry picked from commit 2b164b3f3d02b423538d71fee1a679a70cabccc6)
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Updating qos remap updates requires reading registers to update values,
this adds additional CPU processing when in reality this update is only
needed once.
Bug: 142504774
Change-Id: Iec8d4dfd858b0602db7d2275b6b716dbcffe0d2f
Signed-off-by: Adrian Salido <salidoa@google.com>
Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
drivers/gpu/drm/drm_mipi_dsi.c:1125:28: error: converting the result of '<<' to a boolean; did you mean '(payload[0] << 8) != 0'? [-Werror,-Wint-in-bool-context]
*brightness = payload[0] << 8 || payload[1];
^
Signed-off-by: DhineshCool <dhineshcool585@gmail.com>
Relaxing the CPU latency requirement by about 500 us won't significantly
hurt graphics performance. On the flip side, most SoCs have many idle
levels just below 1000 us in latency, with deeper idle levels having
latencies in excess of 2000 us. Changing the latency requirement to
1000 us allows most SoCs to use their deepest sub-1000-us idle state
while the GPU is active.
Additionally, since the lpm driver has been updated to allow power
levels with latencies equal to target latencies, change the wakeup
latency from 101 to 100 for clarity.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: sajidshahriar72543 <sazidshahriar39@gmail.com>
Remote register I/O amounts to a measurably significant portion of CPU
time due to how frequently this function is used. Cache the value of
each register on-demand and use this value in future invocations to
mitigate the expensive I/O.
Co-authored-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>
Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Writing to registers is frequent enough that there is a measurably
significant portion of CPU time spent on checking the debug mask for
whether to log. Remove the check and logging call altogether to
eliminate the overhead.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: engstk <eng.stk@sapo.pt>
Signed-off-by: mcdachpappe <noreference@web.de>
We're not going to debug the SDE driver in production. Don't compile the
code at all to reduce the measurably significant overhead in frame commit
hotpaths.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: engstk <eng.stk@sapo.pt>
Signed-off-by: mcdachpappe <noreference@web.de>
The SDE event log is not necessary and hogs a substantial amount of CPU
time on logging events in the frame commit path. Omit the code
altogether along with some other debugfs-related code to fix the waste
of CPU.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: engstk <eng.stk@sapo.pt>
Signed-off-by: mcdachpappe <noreference@web.de>
We won't be using this with an Adreno 3xx/4xx/5xx GPU.
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Change-Id: I4e3ad6042b8017f8f44832d01c0eb3bb5ea143a2
Same reasoning as 20461e4aef
There's no point in using integers where a short is big enough.
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
(cherry picked from commit 9940e8d83222e9366b9570f2fe909062f4ff5ab0)
(cherry picked from commit adeb04a337d54dac60d2f0b35e20f440b63493d3)