[Tuning] Disable CPU Idle in NNAPI workload with PMQoS CPU DMA Latency
To improve the return path latency, we want to keep the CPU at at least WFI state (Idle_1). The PMQos cpu_dma_latency knob prevents the CPU from falling below WFI state. This makes the return path CPU wakeup latency very good. Check with wvw@, the power impact shouldn't be too significant. The average energy cost per inference dropped from 3.85 to 3.47 mJ. The reason why the power number is lower WITH disable-idle is that, due to better latency, we get to run more inferences given the same amount of time. This makes the average power consumption lower. Measurement: MLPerf IC model Latency (ms) Power (mW) Energy/inference (mJ) MLPerf scores Default 1.35 2837 3.85 560 Disable CPU Idle 0.98 3539 3.47 826 https://docs.google.com/presentation/d/1zx7sLkhOClmuRTCrq8-l3N1mZrrv7f-CtcdMuzV0eaI/edit?pli=1#slide=id.g12dd9e50b4b_0_0 Bug: 232183574 Test: MLPerf on Android T. Performance improved. Verified on Perfetto. Change-Id: Ia807bf0849e4d9b0b0e8c9510335129ca89e791f
This commit is contained in:
parent
ee6a5a5e72
commit
2c15fb2a5b
4 changed files with 60 additions and 0 deletions
|
@ -172,6 +172,15 @@
|
||||||
"DefaultIndex": 0,
|
"DefaultIndex": 0,
|
||||||
"ResetOnInit": true
|
"ResetOnInit": true
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"Name": "PMQoSCpuDmaLatency",
|
||||||
|
"Path": "/dev/cpu_dma_latency",
|
||||||
|
"Values": [
|
||||||
|
"44",
|
||||||
|
"1000"
|
||||||
|
],
|
||||||
|
"HoldFd": true
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"Name": "CDPreferIdle",
|
"Name": "CDPreferIdle",
|
||||||
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
||||||
|
@ -1655,6 +1664,12 @@
|
||||||
"Duration": 2000,
|
"Duration": 2000,
|
||||||
"Value": "512"
|
"Value": "512"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"PowerHint": "ML_ACC",
|
||||||
|
"Node": "PMQoSCpuDmaLatency",
|
||||||
|
"Duration": 2000,
|
||||||
|
"Value": "44"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"PowerHint": "DEVICE_IDLE",
|
"PowerHint": "DEVICE_IDLE",
|
||||||
"Node": "RestrictedCpuset",
|
"Node": "RestrictedCpuset",
|
||||||
|
|
|
@ -190,6 +190,15 @@
|
||||||
"DefaultIndex": 0,
|
"DefaultIndex": 0,
|
||||||
"ResetOnInit": true
|
"ResetOnInit": true
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"Name": "PMQoSCpuDmaLatency",
|
||||||
|
"Path": "/dev/cpu_dma_latency",
|
||||||
|
"Values": [
|
||||||
|
"44",
|
||||||
|
"1000"
|
||||||
|
],
|
||||||
|
"HoldFd": true
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"Name": "CDPreferIdle",
|
"Name": "CDPreferIdle",
|
||||||
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
||||||
|
@ -1677,6 +1686,12 @@
|
||||||
"Duration": 2000,
|
"Duration": 2000,
|
||||||
"Value": "512"
|
"Value": "512"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"PowerHint": "ML_ACC",
|
||||||
|
"Node": "PMQoSCpuDmaLatency",
|
||||||
|
"Duration": 2000,
|
||||||
|
"Value": "44"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"PowerHint": "DEVICE_IDLE",
|
"PowerHint": "DEVICE_IDLE",
|
||||||
"Node": "RestrictedCpuset",
|
"Node": "RestrictedCpuset",
|
||||||
|
|
|
@ -172,6 +172,15 @@
|
||||||
"DefaultIndex": 0,
|
"DefaultIndex": 0,
|
||||||
"ResetOnInit": true
|
"ResetOnInit": true
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"Name": "PMQoSCpuDmaLatency",
|
||||||
|
"Path": "/dev/cpu_dma_latency",
|
||||||
|
"Values": [
|
||||||
|
"44",
|
||||||
|
"1000"
|
||||||
|
],
|
||||||
|
"HoldFd": true
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"Name": "CDPreferIdle",
|
"Name": "CDPreferIdle",
|
||||||
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
||||||
|
@ -1639,6 +1648,12 @@
|
||||||
"Duration": 2000,
|
"Duration": 2000,
|
||||||
"Value": "512"
|
"Value": "512"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"PowerHint": "ML_ACC",
|
||||||
|
"Node": "PMQoSCpuDmaLatency",
|
||||||
|
"Duration": 2000,
|
||||||
|
"Value": "44"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"PowerHint": "DEVICE_IDLE",
|
"PowerHint": "DEVICE_IDLE",
|
||||||
"Node": "RestrictedCpuset",
|
"Node": "RestrictedCpuset",
|
||||||
|
|
|
@ -190,6 +190,15 @@
|
||||||
"DefaultIndex": 0,
|
"DefaultIndex": 0,
|
||||||
"ResetOnInit": true
|
"ResetOnInit": true
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"Name": "PMQoSCpuDmaLatency",
|
||||||
|
"Path": "/dev/cpu_dma_latency",
|
||||||
|
"Values": [
|
||||||
|
"44",
|
||||||
|
"1000"
|
||||||
|
],
|
||||||
|
"HoldFd": true
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"Name": "CDPreferIdle",
|
"Name": "CDPreferIdle",
|
||||||
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
"Path": "/proc/vendor_sched/cam_prefer_idle",
|
||||||
|
@ -1661,6 +1670,12 @@
|
||||||
"Duration": 2000,
|
"Duration": 2000,
|
||||||
"Value": "512"
|
"Value": "512"
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"PowerHint": "ML_ACC",
|
||||||
|
"Node": "PMQoSCpuDmaLatency",
|
||||||
|
"Duration": 2000,
|
||||||
|
"Value": "44"
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"PowerHint": "DEVICE_IDLE",
|
"PowerHint": "DEVICE_IDLE",
|
||||||
"Node": "RestrictedCpuset",
|
"Node": "RestrictedCpuset",
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue