Commit Graph

7 Commits

Author SHA1 Message Date
Jaegeuk Kim
5697d55978 mm: mm_event: show MM/FS/IO/UFS latencies in fault flow
Tracing points are:
                        mm_event types
MM:    Page fault              -
F2FS:  read metadata in F2FS   |
F2FS:  alloc_bio in F2FS       |  -  -  -  -
F2FS:  merge bios in F2FS      |   |  |  |  |
BLOCK: submit_bio              |   |  |  | - blk_read_submit_bio
BLOCK: IO scheduler/SCSI       |   |  |  |
UFS:   ufshcd_queuecommand     |   |  | - ufs_read_queue_cmd
UFS:   resume/ungating         |   |  |
UFS:   IO completion           |   | - ufs_read_compl_cmd
BLOCK: softirq                 |   |
F2FS:  read_end_io in F2FS     |  - f2fs_read_data
MM:    end of page fauilt     -  maj_flt

For example,

> cat /d/tracing/instances/pixel-trace/trace_pipe

<...>-8364  [000] ....  5858.525417: mm_event_record: blk_read_submit_bio count=1 avg_lat=5 max_lat=5
<...>-8364  [000] ....  5858.525418: mm_event_record: ufs_read_queue_cmd count=1 avg_lat=96 max_lat=96
<...>-8364  [000] ....  5858.525418: mm_event_record: ufs_read_send_cmd count=1 avg_lat=143 max_lat=143
...
sitor-13079 [007] ....  5858.827375: mm_event_record: ufs_read_compl_cmd count=1 avg_lat=12545 max_lat=12545
sitor-13079 [007] ....  5858.827433: mm_event_record: f2fs_read_data count=1 avg_lat=12587 max_lat=12587
...
<...>-13174 [001] ....  5859.187505: mm_event_record: maj_flt count=1 avg_lat=12677 max_lat=12677

Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Change-Id: I0712f543212a14ab2b1d6f092a9a2a7e3ed9f0a0
2020-06-02 19:06:40 +08:00
Minchan Kim
9e9d62a0a3 mm: mm_event: comment about enum event type ordering
Perfetto uses binary tracing so we shouldn't reordring/removing
the event type. Describe it more explicitly.

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I7fa415b0ddc803e9ad81590f692a46bb416b2dd9
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit ab281c5bf4753b1048d8a059342f89a0d24e9169)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:56 +08:00
Minchan Kim
f20debfb4e mm: mm_event supports vmstat
Vmstat is significantly important to investigate MM problem.
We have solved many problmes with it via asking users to get
vmstat data periodically from the device, which manual way is
painful once we release the device or on hard reproducible
scenario.

This patch adds periodic vmstat dump into mm_event. It works
only if there are some events in compaction or reclaim. Thus,
unless there is memory pressure, it doesn't gather any vmstat
data. Default interval between each dump is 1000ms.
Admin can tweak it via

	echo 2000 > /sys/kernel/debug/mm_event/vmstat_period_ms

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4c0e7237d7764c4ea79da00952e5de34ccbe4187
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 2637ebbe4fcfb8fcff64bb83f7b3917b8bc566e4)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:56 +08:00
Minchan Kim
e1086c64a4 mm: mm_event: add read io stat
Read IO's latency as well as filemap fault could affect system
performance so this patch keeps track it on.

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I761b7110339cf1e5ef24530ad32aedd784d00d07
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 1736bcfef806a113b17df6b05c1d8dfa6ee04d47)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:55 +08:00
Martin Liu
837475ed71 mm: mm_event: add special kernel allocation stat
Record the count of special page allocation on the process context.

This patch aims for accounting of special page allocation which
consumed a lot by android system.
At this moment, ION system heap is good candidate(it could cover
other kernel allocation in future).
With that, we could keep tracking burst kernel allocation owner
so that it would be useful to find places caused by lmk, reclaim,
compaction latency.

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I5942fd940d98baa2eb814f66b076cb37ecd3b4aa
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 0fc7b97b79c73902f1fe0a60efc8b551cb6a3251)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:54 +08:00
Minchan Kim
6df920a8fa mm: mm_event: add swapin stat
Many embedded devices use zram as swap. Compared to storage swap
(e.g. UFS), swapin from zram(ie., decompression) is extremly fast
so it might be not major fault but minor. So this patch provides
swapin latency tracking to distinguish them from storage major
fault.

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I1c32430e32a051916ede5219bd5f40a9002652bc
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 9de60685b99e819e57fb6c2bf636d5ac973e4f01)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:54 +08:00
Minchan Kim
6dd6325deb mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.

It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because

1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.

2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.

3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.

To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period

With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.

After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.

<...>-1665  [001] d...   217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665  [001] d...   217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665  [001] d...   217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626   [000] d...   217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547  [000] ....   217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547  [000] ....   217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0

This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/

I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.

Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.

Base:
fault/wsec avg: 758489.8288

minor faults=13123118, major faults=0 ctx switch=139234
    User   System     Wall        fault/wsec
  39.55s   41.73s   17.49s        749995.768
minor faults=13123135, major faults=0 ctx switch=139627
    User   System     Wall        fault/wsec
  34.59s   41.61s   16.95s        773906.976
minor faults=13123061, major faults=0 ctx switch=139254
    User   System     Wall        fault/wsec
  39.03s   41.55s   16.97s        772966.334
minor faults=13123131, major faults=0 ctx switch=139970
    User   System     Wall        fault/wsec
  36.71s   42.12s   17.04s        769941.019
minor faults=13123027, major faults=0 ctx switch=138524
    User   System     Wall        fault/wsec
  42.08s   42.24s   18.08s        725639.047

Base + MM event + event trace enable:
fault/wsec avg: 759626.1488

minor faults=13123488, major faults=0 ctx switch=140303
    User   System     Wall        fault/wsec
  37.66s   42.21s   17.48s        750414.257
minor faults=13123066, major faults=0 ctx switch=138119
    User   System     Wall        fault/wsec
  36.77s   42.14s   17.49s        750010.107
minor faults=13123505, major faults=0 ctx switch=140021
    User   System     Wall        fault/wsec
  38.51s   42.50s   17.54s        748022.219
minor faults=13123431, major faults=0 ctx switch=138517
    User   System     Wall        fault/wsec
  36.74s   41.49s   17.03s        770255.610
minor faults=13122955, major faults=0 ctx switch=137174
    User   System     Wall        fault/wsec
  40.68s   40.97s   16.83s        779428.551

Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:21:52 +08:00