26318 Commits

Author SHA1 Message Date
Wei Wang
6400cd3b94 sched: restrict iowait boost to tasks with prefer_idle
Currently iowait doesn't distinguish background/foreground tasks and we
have seen cases where a device run to high frequency unnecessarily when
running some background I/O. This patch limits iowait boost to tasks with
prefer_idle only. Specifically, on Pixel, those are foreground and top
app tasks.

Bug: 130308826
Test: Boot and trace
Change-Id: I2d892beeb4b12b7e8f0fb2848c23982148648a10
Signed-off-by: Wei Wang <wvw@google.com>
Signed-off-by: Lau <laststandrighthere@gmail.com>
2024-08-15 08:22:43 +05:30
Maria Yu
b5c22baa21 sched: core: Clear walt rq request in cpu starting
Clear walt rq request in cpu starting.

Change-Id: Id3004337f3924984b8b812151a6ba01c6f1c013e
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
(cherry picked from commit 32df8f93e147dd54331161e9180d7ea488b750f9)
2024-08-15 08:22:18 +05:30
Pavankumar Kondeti
74a8607aa7 sched/walt: Fix the memory leak of idle task load pointers
The memory for task load pointers are allocated twice for each
idle thread except for the boot CPU. This happens during boot
from idle_threads_init()->idle_init() in the following 2 paths.

1. idle_init()->fork_idle()->copy_process()->
		sched_fork()->init_new_task_load()

2. idle_init()->fork_idle()-> init_idle()->init_new_task_load()

The memory allocation for all tasks happens through the 1st path,
so use the same for idle tasks and kill the 2nd path. Since
the idle thread of boot CPU does not go through fork_idle(),
allocate the memory for it separately.

Change-Id: I4696a414ffe07d4114b56d326463026019e278f1
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
(cherry picked from commit eb58f47212c9621be82108de57bcf3e94ce1035a)
2024-08-15 07:11:04 +05:30
DhineshCool
c0dd3261ad Revert "sched: Do not reduce perceived CPU capacity while idle"
This reverts commit 20dfb57cb1.
2024-08-15 06:33:57 +05:30
DhineshCool
f99e24746b Revert "cpufreq: schedutil: Enforce realtime priority"
This reverts commit 970b81bf75.
2024-08-15 06:17:11 +05:30
Maria Yu
d6631fffef sched/fair: Consider others if target cpu overutilized
If target cpu overutilized, it's better to consider
other group cpu. It can avoid unnecessary waiting on
overutilized cpu and wait until load balance for task
to be run.

Change-Id: I6f8bccb611d2f11471254cf2795fb5bf3f122292
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
(cherry picked from commit b9f8fdc34eeb61fcc7c770b6277a83fd30fc7d8e)
2024-08-13 23:40:43 +05:30
Chris Redpath
9314b62205 FROMLIST: sched/fair: Don't move tasks to lower capacity cpus unless necessary
When lower capacity CPUs are load balancing and considering to pull
something from a higher capacity group, we should not pull tasks from a
cpu with only one task running as this is guaranteed to impede progress
for that task. If there is more than one task running, load balance in
the higher capacity group would have already made any possible moves to
resolve imbalance and we should make better use of system compute
capacity by moving a task if we still have more than one running.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Change-Id: Ib86570abdd453a51be885b086c8d80be2773a6f2
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
[from https://lore.kernel.org/lkml/1530699470-29808-11-git-send-email-morten.rasmussen@arm.com/]
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Git-commit: 07e7ce6c8459defc34e63ae0f0334e811d223990
Git-repo: https://android.googlesource.com/kernel/common/
[clingutla@codeaurora.org: Resolved merge conflicts.]
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
(cherry picked from commit 779459e3fffda001181cfd6b1be2ffd3da25002c)
2024-08-13 23:40:43 +05:30
Joonwoo Park
ef01112e02 sched: ceil idle index to prevent from out of bound accessing
It's possible size of given idle cost index is smaller than CPU's
possible idle index size.  Ceil the CPU's idle index to prevent out
of bound accessing.

Change-Id: Idecb4f68758dd0183886ea74d0e9da3d236b0062
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
(cherry picked from commit ecedc7afd841c8d7ef0145924620304608d269ef)
2024-08-13 23:40:42 +05:30
Joonwoo Park
12312cb361 sched: prevent out of bound access in sched_group_energy()
group_idle_state() can return INT_MAX + 1 which is undefined behaviour
when there is no CPUs in sched_group.  Prevent such by error correctly.

Change-Id: If9796c829c091e461231569dc38c5e5456f58037
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[clingutla@codeaurora.org: Fixed trivial merge conflicts and squashed
  msm-4.14 change]
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
(cherry picked from commit bb5b0e61527011e4ebfc4058713a9068da9e7492)
2024-08-13 23:40:42 +05:30
Maria Yu
57d6066272 cpufreq: schedutil: Queue sugov irq work on policy online cpu
Got never update frequency if scheduled the irq
work on an offlined cpu and it will always pending.
Queue sugov irq work on any online cpu if current
cpu is offline.

Change-Id: I33fc691917b5866488b6aeb11ed902a2753130b2
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
(cherry picked from commit 1d2db9ab99a9abd0d9dcb320e6e0d266e21884f9)
2024-08-13 23:40:42 +05:30
Maria Yu
aa4a0a2807 sched/walt: Avoid walt irq work in offlined cpu
Avoid walt irq work in offlined cpu.

Change-Id: Ia4410562f66bfa57daa15d8c0a785a2c7a95f2a0
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
(cherry picked from commit 702cec976c863388c784eff37a71fa3ee8bb84d7)
2024-08-13 23:40:42 +05:30
Pavankumar Kondeti
37a5c34f00 Revert "sched: Remove sched_ktime_clock()"
This reverts 'commit 24c18127e9 ("sched: Remove sched_ktime_clock()")'

WALT accounting uses ktime_get() as time source to keep windows in
align with the tick. ktime_get() API should not be called while the
timekeeping subsystem is suspended during the system suspend. The
code before the reverted patch has a wrapper around ktime_get() to
avoid calling ktime_get() when timekeeping subsystem is suspended.

The reverted patch removed this wrapper with the assumption that there
will not be any scheduler activity while timekeeping subsystem is
suspended. The timekeeping subsystem is resumed very early even before
non-boot CPUs are brought online. However it is possible that tasks
can wake up from the idle notifiers which gets called before timekeeping
subsystem is resumed.

When this happens, the time read from ktime_get() will not be consistent.
We see a jump from the values that would be returned later when timekeeping
subsystem is resumed. The rq->window_start update happens with incorrect
time. This rq->window_start becomes inconsistent with the rest of the
CPUs's rq->window_start and wallclock time after timekeeping subsystem is
resumed. This results in WALT accounting bugs.

Change-Id: I9c3b2fb9ffbf1103d1bd78778882450560dac09f
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
(cherry picked from commit faa04442e7a31357724dbb8e49ba64372ef37862)
2024-08-13 23:40:42 +05:30
Pavankumar Kondeti
e8e661152f sched/fair: Fix redundant load balancer reattempt due to LBF_ALL_PINNED
LBF_ALL_PINNED flag should cleared in can_migrate_task() if the task
can run on the destination CPU during load balance. In current code,
can_migrate_task() return incorrectly without clearing this flag
in case if the task can't be migrated to the destination CPU due to
cumulative window demand constraints. Since LBF_ALL_PINNED flag
is not cleared, load balancer thinks that none of the tasks running
on the busiest group can't be migrated to the destination CPU due
to affinity settings and tries to find another busiest group. Prevent
this incorrect reattempt of load balance by clearing LBF_ALL_PINNED
flag right after the task affinity check in can_migrate_task().

Change-Id: Iad1cf42b1aaf70106ee5ecfbd9499ccb6eb7497e
[clingutla@codeaurora.org: Resolved merge conflicts]
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
(cherry picked from commit 5ee367fc9386d4e36af644942d9d10f97827bab1)
2024-08-13 23:40:41 +05:30
Maria Yu
a01e3aaaff sched/fair: Avoid unnecessary active load balance
When find busiest group, it will avoid load balance if
it is only 1 task running on src cpu. Consider race when
different cpus do newly idle load balance at the same time,
check src cpu nr_running to avoid unnecessary active load
balance again.
See the race condition example here:
  1) cpu2 have 2 tasks, so cpu2 rq->nr_running == 2 and cfs.h_nr_running
      ==2.
  2) cpu4 and cpu5 doing newly idle load balance at the same time.
  3) cpu4 and cpu5 both see cpu2 sched_load_balance_sg_stats sum_nr_run=2
     so they are both see cpu2 as the busiest rq.
  4) cpu5 did a success migration task from cpu2, so cpu2 only have 1 task
     left, cpu2 rq->nr_running == 1 and cfs.h_nr_running ==1.
  5) cpu4 surely goes to no_move because currently cpu4 only have 1 task
     which is currently running.
  6) and then cpu4 goes here to check if cpu2 need active load balance.

Change-Id: Ia9539a43e9769c4936f06ecfcc11864984c50c29
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
(cherry picked from commit fc61703628de002e2a5bf88e09933dbc3552d156)
2024-08-13 23:40:41 +05:30
Pavankumar Kondeti
9efe3c5438 sched/walt: Fix stale max_capacity issue during CPU hotplug
Scheduler keeps track of the maximum capacity among all online CPUs
in max_capacity. This is useful in checking if a given cluster/CPU
is a max capacity CPU or not. The capacity of a CPU gets updated
when its max frequency is limited by cpufreq and/or thermal. The
CPUfreq limits notifications are received via CPUfreq policy
notifier. However CPUfreq keeps the policy intact even when all
of the CPUs governed by the policy are hotplugged out. So the
CPUFREQ_REMOVE_POLICY notification never arrives and scheduler's
notion of max_capacity becomes stale. The max_capacity may get
corrected at some point later when CPUFREQ_NOTIFY notification
comes for other online CPUs. But when the hotplugged CPUs comes
online the max_capacity does not reflect since CPUFREQ_ADD_POLICY
is not sent by the cpufreq.

For example consider a system with 4 BIG and 4 little CPUs. Their
original capacities are 2048 and 1024 respectively. The max_capacity
points to 2048 when all CPUs are online. Now,

1. All 4 BIG CPUs are hotplugged out. Since there is no notification,
the max_capacity still points to 2048, which is incorrect.
2. User clips the little CPUs's max_freq by 50%. CPUFREQ_NOTIFY arrives
and max_capacity is updated by iterating all the online CPUs. At this
point max_capacity becomes 512 which is correct.
3. User removes the above limits of little CPUs. The max_capacity
becomes 1024 which is correct.
4. Now, BIG CPUs are hotplugged in. Since there is no notification,
the max_capacity still points to 1024, which is incorrect.

Fix this issue by wiring the max_capacity updates in WALT to scheduler
hotplug callbacks. Ideally we want cpufreq domain hotplug callbacks
but such notifiers are not present. So the max_capacity update is
forced even when it is not necessary but that should not be a concern.
Because CPU hotplug is supposed to be a rare event.

The scheduler hotplug callbacks happen even before the hotplug CPU is
removed from cpu_online_mask, so use cpu_active() check while evaluating
the max_capacity. Since cpu_active_mask is a subset of cpu_online_mask,
this is sufficient.

Change-Id: I97b1974e2de1a9730285715858f1ada416d92a7a
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
(cherry picked from commit 3cd81b52aedf6802aaf7b41f3550b1850c7a09a4)
2024-08-13 23:40:41 +05:30
tip-bot for Jacob Shin
2bc84a0ac1 sched/fair: Force balancing on NOHZ balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228b ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Change-Id: I6b0db178c0707603c8fd764fd3e44524c5345241
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 583ffd99d7657755736d831bbc182612d1d2697d
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
(cherry picked from commit 3d9aec71e139bce6d592b56afaa30f02c344e80e)
2024-08-13 23:40:41 +05:30
Lingutla Chandrasekhar
70e5add1e9 sched: energy: rebuild sched_domains with actual capacities
While sched initialization, sched_domains might have built
with default capacity values, and max_{min_}_cap_org_cpu's
have updated based on them. After energy probe called,
these capacities would change, but max_{min_}cap_org_cpu's
still have old values. And using these staled cpus could give
wrong start_cpu in finding energy efficient cpu.

So rebuild sched_domains, which updates all cpu's group capacities
with actual capacities and then build domains again, and update
max_{min_}cap_org_cpus as well.

Change-Id: I07d58bc849de363c5ed8fb743ab98d3fba727130
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
(cherry picked from commit 5b2c99599d1dcf79ef7dec93c7935d6fc48869db)
2024-08-13 23:40:41 +05:30
Martin KaFai Lau
a6710190e0 bpf: Refactor codes handling percpu map
Refactor the codes that populate the value
of a htab_elem in a BPF_MAP_TYPE_PERCPU_HASH
typed bpf_map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2024-08-13 23:40:22 +05:30
Martin KaFai Lau
84360a36df bpf: Add percpu LRU list
Instead of having a common LRU list, this patch allows a
percpu LRU list which can be selected by specifying a map
attribute.  The map attribute will be added in the later
patch.

While the common use case for LRU is #reads >> #updates,
percpu LRU list allows bpf prog to absorb unusual #updates
under pathological case (e.g. external traffic facing machine which
could be under attack).

Each percpu LRU is isolated from each other.  The LRU nodes (including
free nodes) cannot be moved across different LRU Lists.

Here are the update performance comparison between
common LRU list and percpu LRU list (the test code is
at the last patch):

[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
 1 cpus: 2934082 updates
 4 cpus: 7391434 updates
 8 cpus: 6500576 updates

[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{printr " updates"}'; done
  1 cpus: 2896553 updates
  4 cpus: 9766395 updates
  8 cpus: 17460553 updates

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2024-08-13 23:40:22 +05:30
Martin KaFai Lau
d15c5c69e6 bpf: LRU List
Introduce bpf_lru_list which will provide LRU capability to
the bpf_htab in the later patch.

* General Thoughts:
1. Target use case.  Read is more often than update.
   (i.e. bpf_lookup_elem() is more often than bpf_update_elem()).
   If bpf_prog does a bpf_lookup_elem() first and then an in-place
   update, it still counts as a read operation to the LRU list concern.
2. It may be useful to think of it as a LRU cache
3. Optimize the read case
   3.1 No lock in read case
   3.2 The LRU maintenance is only done during bpf_update_elem()
4. If there is a percpu LRU list, it will lose the system-wise LRU
   property.  A completely isolated percpu LRU list has the best
   performance but the memory utilization is not ideal considering
   the work load may be imbalance.
5. Hence, this patch starts the LRU implementation with a global LRU
   list with batched operations before accessing the global LRU list.
   As a LRU cache, #read >> #update/#insert operations, it will work well.
6. There is a local list (for each cpu) which is named
   'struct bpf_lru_locallist'.  This local list is not used to sort
   the LRU property.  Instead, the local list is to batch enough
   operations before acquiring the lock of the global LRU list.  More
   details on this later.
7. In the later patch, it allows a percpu LRU list by specifying a
   map-attribute for scalability reason and for use cases that need to
   prepare for the worst (and pathological) case like DoS attack.
   The percpu LRU list is completely isolated from each other and the
   LRU nodes (including free nodes) cannot be moved across the list.  The
   following description is for the global LRU list but mostly applicable
   to the percpu LRU list also.

* Global LRU List:
1. It has three sub-lists: active-list, inactive-list and free-list.
2. The two list idea, active and inactive, is borrowed from the
   page cache.
3. All nodes are pre-allocated and all sit at the free-list (of the
   global LRU list) at the beginning.  The pre-allocation reasoning
   is similar to the existing BPF_MAP_TYPE_HASH.  However,
   opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in
   the LRU map.

* Active/Inactive List (of the global LRU list):
1. The active list, as its name says it, maintains the active set of
   the nodes.  We can think of it as the working set or more frequently
   accessed nodes.  The access frequency is approximated by a ref-bit.
   The ref-bit is set during the bpf_lookup_elem().
2. The inactive list, as its name also says it, maintains a less
   active set of nodes.  They are the candidates to be removed
   from the bpf_htab when we are running out of free nodes.
3. The ordering of these two lists is acting as a rough clock.
   The tail of the inactive list is the older nodes and
   should be released first if the bpf_htab needs free element.

* Rotating the Active/Inactive List (of the global LRU list):
1. It is the basic operation to maintain the LRU property of
   the global list.
2. The active list is only rotated when the inactive list is running
   low.  This idea is similar to the current page cache.
   Inactive running low is currently defined as
   "# of inactive < # of active".
3. The active list rotation always starts from the tail.  It moves
   node without ref-bit set to the head of the inactive list.
   It moves node with ref-bit set back to the head of the active
   list and then clears its ref-bit.
4. The inactive rotation is pretty simply.
   It walks the inactive list and moves the nodes back to the head of
   active list if its ref-bit is set. The ref-bit is cleared after moving
   to the active list.
   If the node does not have ref-bit set, it just leave it as it is
   because it is already in the inactive list.

* Shrinking the Inactive List (of the global LRU list):
1. Shrinking is the operation to get free nodes when the bpf_htab is
   full.
2. It usually only shrinks the inactive list to get free nodes.
3. During shrinking, it will walk the inactive list from the tail,
   delete the nodes without ref-bit set from bpf_htab.
4. If no free node found after step (3), it will forcefully get
   one node from the tail of inactive or active list.  Forcefully is
   in the sense that it ignores the ref-bit.

* Local List:
1. Each CPU has a 'struct bpf_lru_locallist'.  The purpose is to
   batch enough operations before acquiring the lock of the
   global LRU.
2. A local list has two sub-lists, free-list and pending-list.
3. During bpf_update_elem(), it will try to get from the free-list
   of (the current CPU local list).
4. If the local free-list is empty, it will acquire from the
   global LRU list.  The global LRU list can either satisfy it
   by its global free-list or by shrinking the global inactive
   list.  Since we have acquired the global LRU list lock,
   it will try to get at most LOCAL_FREE_TARGET elements
   to the local free list.
5. When a new element is added to the bpf_htab, it will
   first sit at the pending-list (of the local list) first.
   The pending-list will be flushed to the global LRU list
   when it needs to acquire free nodes from the global list
   next time.

* Lock Consideration:
The LRU list has a lock (lru_lock).  Each bucket of htab has a
lock (buck_lock).  If both locks need to be acquired together,
the lock order is always lru_lock -> buck_lock and this only
happens in the bpf_lru_list.c logic.

In hashtab.c, both locks are not acquired together (i.e. one
lock is always released first before acquiring another lock).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2024-08-13 23:40:21 +05:30
Michal Hocko
d9af72efb8 bpf: do not use KMALLOC_SHIFT_MAX
Commit 01b3f52157 ("bpf: fix allocation warnings in bpf maps and
integer overflow") has added checks for the maximum allocateable size.
It (ab)used KMALLOC_SHIFT_MAX for that purpose.

While this is not incorrect it is not very clean because we already have
KMALLOC_MAX_SIZE for this very reason so let's change both checks to use
KMALLOC_MAX_SIZE instead.

The original motivation for using KMALLOC_SHIFT_MAX was to work around
an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings
but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE
will fit into MAX_ORDER".

Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: kdrag0n <dragon@khronodragon.com>
2024-08-13 23:40:21 +05:30
Tyler Nijmeh
20dfb57cb1 sched: Do not reduce perceived CPU capacity while idle
CPUs that are idle are excellent candidates for latency sensitive or
high-performance tasks. Decrementing their capacity while they are idle
will result in these CPUs being chosen less, and they will prefer to
schedule smaller tasks instead of large ones. Disable this.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2024-08-13 23:36:15 +05:30
Tyler Nijmeh
f5daa9d7ec sched: Enable NEXT_BUDDY for better cache locality
By scheduling the last woken task first, we can increase cache locality
since that task is likely to touch the same data as before.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2024-08-13 23:36:15 +05:30
Tyler Nijmeh
970b81bf75 cpufreq: schedutil: Enforce realtime priority
Even the interactive governor utilizes a realtime priority. It is
beneficial for schedutil to process it's workload at a >= priority
than mundane tasks (KGSL/AUDIO/ETC).

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-08-13 23:36:14 +05:30
Sultan Alsawaf
26a793cb28 Revert "mutex: Add a delay into the SPIN_ON_OWNER wait loop."
This reverts commit c8de3f45ee.

This doesn't make sense for a few reasons. Firstly, upstream uses this
mutex code and it works fine on all arches; why should arm be any
different?

Secondly, once the mutex owner starts to spin on `wait_lock`,
preemption is disabled and the owner will be in an actively-running
state. The optimistic mutex spinning occurs when the lock owner is
actively running on a CPU, and while the optimistic spinning takes
place, no attempt to acquire `wait_lock` is made by the new waiter.
Therefore, it is guaranteed that new mutex waiters which optimistically
spin will not contend the `wait_lock` spin lock that the owner needs to
acquire in order to make forward progress.

Another potential source of `wait_lock` contention can come from tasks
that call mutex_trylock(), but this isn't actually problematic (and if
it were, it would affect the MUTEX_SPIN_ON_OWNER=n use-case too). This
won't introduce significant contention on `wait_lock` because the
trylock code exits before attempting to lock `wait_lock`, specifically
when the atomic mutex counter indicates that the mutex is already
locked. So in reality, the amount of `wait_lock` contention that can
come from mutex_trylock() amounts to only one task. And once it
finishes, `wait_lock` will no longer be contended and the previous
mutex owner can proceed with clean up.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Albert I <kras@raphielgang.org>
2024-08-13 23:36:11 +05:30
Sultan Alsawaf
0e39b53ee6 PM / freezer: Reduce freeze timeout to 1 second for Android
Freezing processes on Android usually takes less than 100 ms, and if it
takes longer than that to the point where the 20 second freeze timeout is
reached, it's because the remaining processes to be frozen are deadlocked
waiting for something from a process which is already frozen. There's no
point in burning power trying to freeze for that long, so reduce the freeze
timeout to a very generous 1 second for Android and don't let anything mess
with it.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-08-13 23:32:53 +05:30
Sultan Alsawaf
2e211875a5 PM / freezer: Abort suspend when there's a wakeup while freezing
Although try_to_freeze_tasks() stops when there's a wakeup, it doesn't
return an error when it successfully freezes everything it wants to freeze.
As a result, the suspend attempt can continue even after a wakeup is
issued. Although the wakeup will be eventually caught later in the suspend
process, kicking the can down the road is suboptimal; when there's a wakeup
detected, suspend should be immediately aborted by returning an error
instead. Make try_to_freeze_tasks() do just that, and also move the wakeup
check above the `todo` check so that we don't miss a wakeup from a process
that successfully froze.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Change-Id: I6d0ff54b1e1e143df2679d3848019590725c6351
2024-08-13 23:31:23 +05:30
Sultan Alsawaf
a65385e72e PM: sleep: Don't allow s2idle to be used
Unfortunately, s2idle is only somewhat functional. Although commit
70441d36af58 ("cpuidle: lpm_levels: add soft watchdog for s2idle") makes
s2idle usable, there are still CPU stalls caused by s2idle's buggy
behavior, and the aforementioned hack doesn't address them. Therefore,
let's stop userspace from enabling s2idle and instead enforce the
default deep sleep mode.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-08-13 23:31:23 +05:30
Xunlei Pang
55c26752ef sched/fair: Advance global expiration when period timer is restarted
When period gets restarted after some idle time, start_cfs_bandwidth()
doesn't update the expiration information, expire_cfs_rq_runtime() will
see cfs_rq->runtime_expires smaller than rq clock and go to the clock
drift logic, wasting needless CPU cycles on the scheduler hot path.

Update the global expiration in start_cfs_bandwidth() to avoid frequent
expire_cfs_rq_runtime() calls once a new period begins.

Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: RuRuTiaSaMa <1009087450@qq.com>
2024-08-13 23:31:04 +05:30
Sultan Alsawaf
d6ba2bb7dd sched/fair: Compile out NUMA code entirely when NUMA is disabled
Scheduler code is very hot and every little optimization counts. Instead
of constantly checking sched_numa_balancing when NUMA is disabled,
compile it out.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-08-13 23:31:04 +05:30
Pavankumar Kondeti
7aa59f8faf BACKPORT: ANDROID: sched: Exempt paused CPU from nohz idle balance
A CPU can be paused while it is idle with it's tick stopped.
nohz_balance_exit_idle() should be called from the local CPU,
so it can't be called during pause which can happen remotely.
This results in paused CPU participating in the nohz idle balance,
which should be avoided. This can be done by calling

Fix this issue by calling nohz_balance_exit_idle() from the paused
CPU when it exits and enters idle again. This lazy approach avoids
waking the CPU from idle during pause.

Bug: 180530906
Change-Id: Ia2dfd9c9cac9b0f37c55a9256b9d5f3141ca0421
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
[ Tashar02: Backport to k4.19 ]
[ RealJohnGalt: Backport to k4.14 ]
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
2024-08-13 23:31:04 +05:30
Patrick Bellasi
b306fed53f cpufreq: schedutil: Fix iowait boost reset
A more energy efficient update of the IO wait boosting mechanism has
been introduced in:

   commit a5a0809 ("cpufreq: schedutil: Make iowait boost more energy
efficient")

where the boost value is expected to be:

 - doubled at each successive wakeup from IO
   staring from the minimum frequency supported by a CPU

 - reset when a CPU is not updated for more then one tick
   by either disabling the IO wait boost or resetting its value to the
   minimum frequency if this new update requires an IO boost.

This approach is supposed to "ignore" boosting for sporadic wakeups from
IO, while still getting the frequency boosted to the maximum to benefit
long sequence of wakeup from IO operations.

However, these assumptions are not always satisfied.
For example, when an IO boosted CPU enters idle for more the one tick
and then wakes up after an IO wait, since in sugov_set_iowait_boost() we
first check the IOWAIT flag, we keep doubling the iowait boost instead
of restarting from the minimum frequency value.

This misbehavior could happen mainly on non-shared frequency domains,
thus defeating the energy efficiency optimization, but it can also
happen on shared frequency domain systems.

Let fix this issue in sugov_set_iowait_boost() by:
 - first check the IO wait boost reset conditions
   to eventually reset the boost value
 - then applying the correct IO boost value
   if required by the caller

Fixes: a5a0809 (cpufreq: schedutil: Make iowait boost more energy
efficient)
Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Yaroslav Furman <yaro330@gmail.com> - backport to 4.4
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2024-08-13 23:28:07 +05:30
Peter Zijlstra
eb7d9be835 sched/core: Fix rules for running on online && !active CPUs
[ Upstream commit 175f0e25abeaa2218d431141ce19cf1de70fa82d ]

As already enforced by the WARN() in __set_cpus_allowed_ptr(), the rules
for running on an online && !active CPU are stricter than just being a
kthread, you need to be a per-cpu kthread.

If you're not strictly per-CPU, you have better CPUs to run on and
don't need the partially booted one to get your work done.

The exception is to allow smpboot threads to bootstrap the CPU itself
and get kernel 'services' initialized before we allow userspace on it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs")
Link: http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-13 23:27:23 +05:30
Xunlei Pang
41d1faa508 sched/fair: Fix bandwidth timer clock drift condition
commit 512ac999d2755d2b7109e996a76b6fb8b888631d upstream.

I noticed that cgroup task groups constantly get throttled even
if they have low CPU usage, this causes some jitters on the response
time to some of our business containers when enabling CPU quotas.

It's very simple to reproduce:

  mkdir /sys/fs/cgroup/cpu/test
  cd /sys/fs/cgroup/cpu/test
  echo 100000 > cpu.cfs_quota_us
  echo $$ > tasks

then repeat:

  cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily

After some analysis, we found that cfs_rq::runtime_remaining will
be cleared by expire_cfs_rq_runtime() due to two equal but stale
"cfs_{b|q}->runtime_expires" after period timer is re-armed.

The current condition to judge clock drift in expire_cfs_rq_runtime()
is wrong, the two runtime_expires are actually the same when clock
drift happens, so this condtion can never hit. The orginal design was
correctly done by this commit:

  a9cf55b286 ("sched: Expire invalid runtime")

... but was changed to be the current implementation due to its locking bug.

This patch introduces another way, it adds a new field in both structures
cfs_rq and cfs_bandwidth to record the expiration update sequence, and
uses them to figure out if clock drift happens (true if they are equal).

Change-Id: Ida0d756728675758499caa225238ed13b4423168
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[alakeshh: backport: Fixed merge conflicts:
 - sched.h: Fix the indentation and order in which the variables are
   declared to match with coding style of the existing code in 4.14
   Struct members of same type were declared in separate lines in
   upstream patch which has been changed back to having multiple
   members of same type in the same line.
   e.g. int a; int b; ->  int a, b; ]
Signed-off-by: Alakesh Haloi <alakeshh@amazon.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: 51f2176d74 ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-13 23:26:23 +05:30
Runmin Wang
2dd90440a1 sched/fair: load balance if a group is overloaded
Doing more aggressive balance if a sched_group is overloaded.

Change-Id: I00950c23c67a40b3431b68ac7ce2a1e470e563ed
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2024-08-13 23:26:23 +05:30
Tyler Nijmeh
e89e4a37bb perf: Restrict perf event sampling CPU time to 5%
Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2024-08-13 23:26:05 +05:30
Tyler Nijmeh
cc2af967c1 sched: Process new forks before processing their parent
This should let brand new tasks launch marginally faster.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2024-08-13 23:26:05 +05:30
Jacob Pan
7fa204cbe8 cpuidle: Allow enforcing deepest idle state selection
When idle injection is used to cap power, we need to override the
governor's choice of idle states.

For this reason, make it possible the deepest idle state selection to
be enforced by setting a flag on a given CPU to achieve the maximum
potential power draw reduction.

Change-Id: I9737e99c4f3f4bc38016b313e76b50cec4cf56cb
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-13 23:26:04 +05:30
Joel Fernandes
27def5fb20 cpufreq: schedutil: Use unsigned int for iowait boost
Make iowait_boost and iowait_boost_max as unsigned int since its unit
is kHz and this is consistent with struct cpufreq_policy.  Also change
the local variables in sugov_iowait_boost() to match this.

Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-13 23:25:59 +05:30
Joel Fernandes
04ec17e1bd cpufreq: schedutil: Make iowait boost more energy efficient
Currently the iowait_boost feature in schedutil makes the frequency
go to max on iowait wakeups.  This feature was added to handle a case
that Peter described where the throughput of operations involving
continuous I/O requests [1] is reduced due to running at a lower
frequency, however the lower throughput itself causes utilization to
be low and hence causing frequency to be low hence its "stuck".

Instead of going to max, its also possible to achieve the same effect
by ramping up to max if there are repeated in_iowait wakeups
happening. This patch is an attempt to do that. We start from a lower
frequency (policy->min) and double the boost for every consecutive
iowait update until we reach the maximum iowait boost frequency
(iowait_boost_max).

I ran a synthetic test (continuous O_DIRECT writes in a loop) on an
x86 machine with intel_pstate in passive mode using schedutil.  In
this test the iowait_boost value ramped from 800MHz to 4GHz in 60ms.
The patch achieves the desired improved throughput as the existing
behavior.

[1] https://patchwork.kernel.org/patch/9735885/

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-13 23:25:58 +05:30
Wei Wang
468e53bb05 ANDROID: sched: fair: balance for single core cluster
Android will unset SD_LOAD_BALANCE for single core cluster domain and
for some product it is true to have a single core cluster and the MC
domain thus lacks the SD_LOAD_BALANCE flag. This will cause
select_task_rq_fair logic break and the task will spin forever
in that core.

Fixes: 00bbe7d605a9 "ANDROID: sched: EAS & 'single cpu per cluster'/cpu hotplug interoperability"

Bug: 141334320
Test: boot and see task on core7 scheduled correctly
Change-Id: I7c2845b1f7bc1d4051eb3ad6a5f9838fb0b1ba04
Signed-off-by: Wei Wang <wvw@google.com>
2024-08-13 23:25:58 +05:30
Sultan Alsawaf
93937605cf kernel: Don't allow IRQ affinity masks to have more than one CPU
Even with an affinity mask that has multiple CPUs set, IRQs always run
on the first CPU in their affinity mask. Drivers that register an IRQ
affinity notifier (such as pm_qos) will therefore have an incorrect
assumption of where an IRQ is affined.

Fix the IRQ affinity mask deception by forcing it to only contain one
set CPU.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
2024-08-13 23:25:57 +05:30
Zachariah Kennedy
d5c9e16340 sched/fair.c: Don't allow SchedTune boosted tasks to be migrated to small cores
We want boosted tasks to run on big cores. But CAF's load balancer
changes do not account for SchedTune boosting, so this allows for
boosted tasks to be migrated to a suboptimal core. Let's mitigate
this by setting the LBF_IGNORE_BIG_TASKS for tasks migrating from a
larger capacity core to a smaller one and to check if the task is
SchedTune boosted. If both are true, do not migrate this task.

Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Carlos Ayrton Lopez Arroyo <15030201@itcelaya.edu.mx>
2024-08-13 23:24:53 +05:30
darkhz
2dff7f511d kernel: time: Silence "Suspended for..." debug messages.
Change-Id: Id585557f265d748e1d8d8bf2e4471bfcca2fe0a4
2024-08-13 23:21:36 +05:30
tytydraco
f91d2df8a3 power: Start killing wakelocks after one minute of idle
Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
2024-08-13 23:19:41 +05:30
Charan Teja Reddy
f038b94806 mm: oom_kill: reap memory of a task that receives SIGKILL
Free the pages parallely for a task that receives SIGKILL using the
oom_reaper. This freeing of pages will help to give the pages to buddy
system well advance.
This reaps for the process which received SIGKILL through
either sys_kill from user or kill_pid from kernel and that sending
process has CAP_KILL capability.
Also sysctl interface, reap_mem_on_sigkill, is added to turn on/off this
feature.

[ExactExampl]: make it enabled by default

Change-Id: I21adb95de5e380a80d7eb0b87d9b5b553f52e28a
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
(cherry picked from commit f9920cfa7ecf420e6a1ced2b53920f3ea9ddfc19)
2024-08-13 23:14:33 +05:30
Nguyen Huu Huy
0694dd0d3f workqueue : add root permission to root control of wq_power_effecient
Add this to control enable to disable wq_power_effecient on app kernel manager 
Change from commit : e8abf85c64
2024-08-13 23:11:51 +05:30
Johannes Weiner
2cd0679c57 BACKPORT: psi: Optimize switching tasks inside shared cgroups
When switching tasks running on a CPU, the psi state of a cgroup
containing both of these tasks does not change. Right now, we don't
exploit that, and can perform many unnecessary state changes in nested
hierarchies, especially when most activity comes from one leaf cgroup.

This patch implements an optimization where we only update cgroups
whose state actually changes during a task switch. These are all
cgroups that contain one task but not the other, up to the first
shared ancestor. When both tasks are in the same group, we don't need
to update anything at all.

We can identify the first shared ancestor by walking the groups of the
incoming task until we see TSK_ONCPU set on the local CPU; that's the
first group that also contains the outgoing task.

The new psi_task_switch() is similar to psi_task_change(). To allow
code reuse, move the task flag maintenance code into a new function
and the poll/avg worker wakeups into the shared psi_group_change().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-3-hannes@cmpxchg.org
Signed-off-by: Aarqw12 <lcockx@protonmail.com>
Signed-off-by: prorooter007 <shreyashwasnik112@gmail.com>
Signed-off-by: Marco Zanin <mrczn.bb@gmail.com>
2024-08-13 23:11:51 +05:30
Johannes Weiner
c7afdeb9a9 BACKPORT: psi: Fix cpu.pressure for cpu.max and competing cgroups
For simplicity, cpu pressure is defined as having more than one
runnable task on a given CPU. This works on the system-level, but it
has limitations in a cgrouped reality: When cpu.max is in use, it
doesn't capture the time in which a task is not executing on the CPU
due to throttling. Likewise, it doesn't capture the time in which a
competing cgroup is occupying the CPU - meaning it only reflects
cgroup-internal competitive pressure, not outside pressure.

Enable tracking of currently executing tasks, and then change the
definition of cpu pressure in a cgroup from

	NR_RUNNING > 1

to

	NR_RUNNING > ON_CPU

which will capture the effects of cpu.max as well as competition from
outside the cgroup.

After this patch, a cgroup running `stress -c 1` with a cpu.max
setting of 5000 10000 shows ~50% continuous CPU pressure.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200316191333.115523-2-hannes@cmpxchg.org
Signed-off-by: Aarqw12 <lcockx@protonmail.com>
Signed-off-by: prorooter007 <shreyashwasnik112@gmail.com>
Signed-off-by: Marco Zanin <mrczn.bb@gmail.com>
2024-08-13 23:11:51 +05:30
Yafang Shao
63fdb8eeb1 BACKPORT: psi: Move PF_MEMSTALL out of task->flags
The task->flags is a 32-bits flag, in which 31 bits have already been
consumed. So it is hardly to introduce other new per process flag.
Currently there're still enough spaces in the bit-field section of
task_struct, so we can define the memstall state as a single bit in
task_struct instead.
This patch also removes an out-of-date comment pointed by Matthew.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/1584408485-1921-1-git-send-email-laoar.shao@gmail.com
Signed-off-by: Aarqw12 <lcockx@protonmail.com>
Signed-off-by: prorooter007 <shreyashwasnik112@gmail.com>
2024-08-13 23:11:51 +05:30