Called from architecture specific code when syscall_exit_to_user_mode() is
not suitable. It simply calls __exit_to_user_mode().
This way __exit_to_user_mode() can still be inlined because it is declared
static __always_inline.
[ tglx: Amended comments and moved it to a different place in the header ]
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201201142755.31931-5-svens@linux.ibm.com
To be called from architecture specific code if the combo interfaces are
not suitable. It simply calls __enter_from_user_mode(). This way
__enter_from_user_mode will still be inlined because it is declared static
__always_inline.
[ tglx: Amend comments and move it to a different location in the header ]
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201201142755.31931-4-svens@linux.ibm.com
Syscall User Dispatch (SUD) must take precedence over seccomp and
ptrace, since the use case is emulation (it can be invoked with a
different ABI) such that seccomp filtering by syscall number doesn't
make sense in the first place. In addition, either the syscall is
dispatched back to userspace, in which case there is no resource for to
trace, or the syscall will be executed, and seccomp/ptrace will execute
next.
Since SUD runs before tracepoints, it needs to be a SYSCALL_WORK_EXIT as
well, just to prevent a trace exit event when dispatch was triggered.
For that, the on_syscall_dispatch() examines context to skip the
tracepoint, audit and other work.
[ tglx: Add a comment on the exit side ]
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201127193238.821364-5-krisman@collabora.com
Introduce a mechanism to quickly disable/enable syscall handling for a
specific process and redirect to userspace via SIGSYS. This is useful
for processes with parts that require syscall redirection and parts that
don't, but who need to perform this boundary crossing really fast,
without paying the cost of a system call to reconfigure syscall handling
on each boundary transition. This is particularly important for Windows
games running over Wine.
The proposed interface looks like this:
prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <off>, <length>, [selector])
The range [<offset>,<offset>+<length>) is a part of the process memory
map that is allowed to by-pass the redirection code and dispatch
syscalls directly, such that in fast paths a process doesn't need to
disable the trap nor the kernel has to check the selector. This is
essential to return from SIGSYS to a blocked area without triggering
another SIGSYS from rt_sigreturn.
selector is an optional pointer to a char-sized userspace memory region
that has a key switch for the mechanism. This key switch is set to
either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the
redirection without calling the kernel.
The feature is meant to be set per-thread and it is disabled on
fork/clone/execv.
Internally, this doesn't add overhead to the syscall hot path, and it
requires very little per-architecture support. I avoided using seccomp,
even though it duplicates some functionality, due to previous feedback
that maybe it shouldn't mix with seccomp since it is not a security
mechanism. And obviously, this should never be considered a security
mechanism, since any part of the program can by-pass it by using the
syscall dispatcher.
For the sysinfo benchmark, which measures the overhead added to
executing a native syscall that doesn't require interception, the
overhead using only the direct dispatcher region to issue syscalls is
pretty much irrelevant. The overhead of using the selector goes around
40ns for a native (unredirected) syscall in my system, and it is (as
expected) dominated by the supervisor-mode user-address access. In
fact, with SMAP off, the overhead is consistently less than 5ns on my
test box.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201127193238.821364-4-krisman@collabora.com
While debugging a situation where a delta for an event was calucalted wrong,
I realize there was nothing making sure that the delta of events are
correct. If a single event has an incorrect delta, then all events after it
will also have one. If the discrepency gets large enough, it could cause
the time stamps to go backwards when crossing sub buffers, that record a
full 64 bit time stamp, and the new deltas are added to that.
Add a way to validate the events at most events and when crossing a buffer
page. This will help make sure that the deltas are always correct. This test
will detect if they are ever corrupted.
The test adds a high overhead to the ring buffer recording, as it does the
audit for almost every event, and should only be used for testing the ring
buffer.
This will catch the bug that is fixed by commit 55ea4cf403 ("ring-buffer:
Update write stamp with the correct ts"), which is not applied when this
commit is applied.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull tracing fixes from Steven Rostedt:
- Use correct timestamp variable for ring buffer write stamp update
- Fix up before stamp and write stamp when crossing ring buffer sub
buffers
- Keep a zero delta in ring buffer in slow path if cmpxchg fails
- Fix trace_printk static buffer for archs that care
- Fix ftrace record accounting for ftrace ops with trampolines
- Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
- Remove WARN_ON in hwlat tracer that triggers on something that is OK
- Make "my_tramp" trampoline in ftrace direct sample code global
- Fixes in the bootconfig tool for better alignment management
* tag 'trace-v5.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Always check to put back before stamp when crossing pages
ftrace: Fix DYNAMIC_FTRACE_WITH_DIRECT_CALLS dependency
ftrace: Fix updating FTRACE_FL_TRAMP
tracing: Fix alignment of static buffer
tracing: Remove WARN_ON in start_thread()
samples/ftrace: Mark my_tramp[12]? global
ring-buffer: Set the right timestamp in the slow path of __rb_reserve_next()
ring-buffer: Update write stamp with the correct ts
docs: bootconfig: Update file format on initrd image
tools/bootconfig: Align the bootconfig applied initrd image size to 4
tools/bootconfig: Fix to check the write failure correctly
tools/bootconfig: Fix errno reference after printf()
Instead of having two structures that represent each block device with
different life time rules, merge them into a single one. This also
greatly simplifies the reference counting rules, as we can use the inode
reference count as the main reference count for the new struct
block_device, with the device model reference front ending it for device
model interaction.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the start_sect field to struct block_device in preparation
of killing struct hd_struct.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Now that the hd_struct always has a block device attached to it, there is
no need for having two size field that just get out of sync.
Additionally the field in hd_struct did not use proper serialization,
possibly allowing for torn writes. By only using the block_device field
this problem also gets fixed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The kernel currently uses kmem_cache to allocate shadow call stacks,
which means an overflows may not be immediately detected and can
potentially result in another task's shadow stack to be overwritten.
This change switches SCS to use virtually mapped shadow stacks for
tasks, which increases shadow stack size to a full page and provides
more robust overflow detection, similarly to VMAP_STACK.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130233442.2562064-2-samitolvanen@google.com
Signed-off-by: Will Deacon <will@kernel.org>
The current ring buffer logic checks to see if the updating of the event
buffer was interrupted, and if it is, it will try to fix up the before stamp
with the write stamp to make them equal again. This logic is flawed, because
if it is not interrupted, the two are guaranteed to be different, as the
current event just updated the before stamp before allocation. This
guarantees that the next event (this one or another interrupting one) will
think it interrupted the time updates of a previous event and inject an
absolute time stamp to compensate.
The correct logic is to always update the timestamps when traversing to a
new sub buffer.
Cc: stable@vger.kernel.org
Fixes: a389d86f7f ("ring-buffer: Have nested events still record running time stamp")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
On powerpc, kprobe-direct.tc triggered FTRACE_WARN_ON() in
ftrace_get_addr_new() followed by the below message:
Bad trampoline accounting at: 000000004222522f (wake_up_process+0xc/0x20) (f0000001)
The set of steps leading to this involved:
- modprobe ftrace-direct-too
- enable_probe
- modprobe ftrace-direct
- rmmod ftrace-direct <-- trigger
The problem turned out to be that we were not updating flags in the
ftrace record properly. From the above message about the trampoline
accounting being bad, it can be seen that the ftrace record still has
FTRACE_FL_TRAMP set though ftrace-direct module is going away. This
happens because we are checking if any ftrace_ops has the
FTRACE_FL_TRAMP flag set _before_ updating the filter hash.
The fix for this is to look for any _other_ ftrace_ops that also needs
FTRACE_FL_TRAMP.
Link: https://lkml.kernel.org/r/56c113aa9c3e10c19144a36d9684c7882bf09af5.1606412433.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: a124692b69 ("ftrace: Enable trampoline when rec count returns back to one")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
With 5.9 kernel on ARM64, I found ftrace_dump output was broken but
it had no problem with normal output "cat /sys/kernel/debug/tracing/trace".
With investigation, it seems coping the data into temporal buffer seems to
break the align binary printf expects if the static buffer is not aligned
with 4-byte. IIUC, get_arg in bstr_printf expects that args has already
right align to be decoded and seq_buf_bprintf says ``the arguments are saved
in a 32bit word array that is defined by the format string constraints``.
So if we don't keep the align under copy to temporal buffer, the output
will be broken by shifting some bytes.
This patch fixes it.
Link: https://lkml.kernel.org/r/20201125225654.1618966-1-minchan@kernel.org
Cc: <stable@vger.kernel.org>
Fixes: 8e99cf91b9 ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This patch reverts commit 978defee11 ("tracing: Do a WARN_ON()
if start_thread() in hwlat is called when thread exists")
.start hook can be legally called several times if according
tracer is stopped
screen window 1
[root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable
[root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace
[root@localhost ~]# less -F /sys/kernel/tracing/trace
screen window 2
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo hwlat > /sys/kernel/debug/tracing/current_tracer
[root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on
triggers warning in dmesg:
WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 hwlat_tracer_start+0xc9/0xd0
Link: https://lkml.kernel.org/r/bd4d3e70-400d-9c82-7b73-a2d695e86b58@virtuozzo.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 978defee11 ("tracing: Do a WARN_ON() if start_thread() in hwlat is called when thread exists")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some partners have value-adds based on aosp/540066, which cannot be
carried in ACK in its entirety as it no longer makes sense as-is (the
select_idle_capacity() rework upstream solved the issue differently).
It seems that those partners do not actually need the wake-wide tweaks,
they only need to access the wake_q length for wake-up balance. To
support this, add minimal tracking to the wake_q infrastructure in the
core kernel, but do that by adding a pointer to the wake_q_head to
task_struct directly to not litter all sched classes with an additional
sibling_count_hint argument to the select_task_rq callbacks.
Modules needing to access the wake_q length can do so by dereferencing
p->wake_q_head in the wake-up path when it is non-NULL.
Bug: 173981591
Signed-off-by: Quentin Perret <qperret@google.com>
Change-Id: I9a98167face92e70aba847d9f04d0c216065478c
Vendors might want to change tasks affinity settings when they are
moving from one cpuset into the other. Add vendor hook to give control
to vendor to implement what they need.
Bug: 174125747
Change-Id: Icee0405be0bca432002dae4a26ebe945082ce052
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Vendors might want to change tasks affinity settings when they are
moving from one cpuset into the other. Add vendor hook to give control
to vendor to implement what they need in sched_setaffinity().
Bug: 174125747
Change-Id: Ie703448147377cd62e76a58b620a7ab849a04924
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
When an interrupt allocation fails for N interrupts, it is pretty
common for the error handling code to free the same number of interrupts,
no matter how many interrupts have actually been allocated.
This may result in the domain freeing code to be unexpectedly called
for interrupts that have no mapping in that domain. Things end pretty
badly.
Instead, add some checks to irq_domain_free_irqs_hierarchy() to make sure
that thiss does not follow the hierarchy if no mapping exists for a given
interrupt.
Fixes: 6a6544e520 ("genirq/irqdomain: Remove auto-recursive hierarchy support")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201129135551.396777-1-maz@kernel.org
There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.
In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter that
can be passed down to irq_domain_alloc_descs().
irq_create_mapping() is re-implemented as a wrapper around
irq_create_mapping_affinity().
No functional change.
Fixes: e75eafb9b0 ("genirq/msi: Switch to new irq spreading infrastructure")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kurz <groug@kaod.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201126082852.1178497-2-lvivier@redhat.com
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
Trivial conflict in CAN, keep the net-next + the byteswap wrapper.
Conflicts:
drivers/net/can/usb/gs_usb.c
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The SCMI pull request for the arm/drivers branch requires v5.10-rc2
because of dependencies with other git trees, so merge that in here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull printk fixes from Petr Mladek:
- do not lose trailing newline in pr_cont() calls
- two trivial fixes for a dead store and a config description
* tag 'printk-for-5.10-rc6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: finalize records with trailing newlines
printk: remove unneeded dead-store assignment
init/Kconfig: Fix CPU number in LOG_CPU_MAX_BUF_SHIFT description
update_cpumask had a special case for empty buf which
did not update cpus_requested. This change reduces
differences (only to parsing) in empty/non-empty codepaths
to make them consistent.
Bug: 174125747
Bug: 120444281
Fixes: 4803def4e0b2 ("ANDROID: cpuset: Make cpusets restore on hotplug")
Test: check that writes to /dev/cpuset/background/tasks
Test: work as expected, e.g.:
Test: echo $$ > /dev/cpuset/background/tasks
Test: echo > /dev/cpuset/background/tasks
Signed-off-by: Roman Kiryanov <rkir@google.com>
Change-Id: I49d320ea046636ec38bd23f053317abc59f64f8e
[satyap@codeaurora.org: port to android-mainline kernel]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset,
that cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.
Bug: 174125747
Bug: 120444281
Change-Id: I22cdf33e7d312117bcefba1aeb0125e1ada289a9
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
[AmitP: Refactored original changes to align with upstream commit
201af4c0fa ("cgroup: move cgroup files under kernel/cgroup/")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[satyap@codeaurora.org: port to android-mainline kernel]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
It needs to export some symbols so that vendor module could reference.
Bug: 170507310
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I1b3c3ea8d0a11c01a1ca9e124e93f85e52856dc4
(cherry picked from commit a61271a41a2c2825a51bd7655fc446e16d23f5f6)
Signed-off-by: Will McVicker <willmcvicker@google.com>
This is to add capability for vendor to decide whether a cpufreq update
is needed, e.g. up/down rate limit.
Using restricted hook since it can be called from scheduler wakeup path.
Bug: 170511085
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: If9adea3a3e31efbf3858fbd009665a07dc70c638
(cherry picked from commit f9f3464532a045257f8138338b1beda86ef0a3be)
Signed-off-by: Will McVicker <willmcvicker@google.com>
Upstream moved the sugov to DEADLINE class which has higher prio than RT
so it can potentially block many RT use case in Android.
Also currently iowait doesn't distinguish background/foreground tasks
and we have seen cases where device run to high frequency unnecessarily
when running some background I/O.
Bug: 171598214
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I21e9bfe9ef75a4178279574389e417c3f38e65ac
(cherry picked from commit 03177ef82bd942a3f163e826063491bae6ff0ac9)
Signed-off-by: Will McVicker <willmcvicker@google.com>
trace_sched_blocked_trace in CFS is really useful for debugging via
trace because it tell where the process was stuck on callstack.
For example,
<...>-6143 ( 6136) [005] d..2 50.278987: sched_blocked_reason: pid=6136 iowait=0 caller=SyS_mprotect+0x88/0x208
<...>-6136 ( 6136) [005] d..2 50.278990: sched_blocked_reason: pid=6142 iowait=0 caller=do_page_fault+0x1f4/0x3b0
<...>-6142 ( 6136) [006] d..2 50.278996: sched_blocked_reason: pid=6144 iowait=0 caller=SyS_prctl+0x52c/0xb58
<...>-6144 ( 6136) [006] d..2 50.279007: sched_blocked_reason: pid=6136 iowait=0 caller=vm_mmap_pgoff+0x74/0x104
However, sometime it gives pointless information like this.
RenderThread-2322 ( 1805) [006] d.s3 50.319046: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220
logd.writer-594 ( 587) [002] d.s3 50.334011: sched_blocked_reason: pid=6126 iowait=1 caller=wait_on_page_bit+0x194/0x208
kworker/u16:13-333 ( 333) [007] d.s4 50.343161: sched_blocked_reason: pid=6136 iowait=1 caller=__lock_page_killable+0x17c/0x220
Such wait_on_page_bit, __lock_page_killable are pointless because it doesn't
carry on higher information to identify the callstack.
The reason is page_lock and waitqueue are special synchronization method unlike
other normal locks(mutex, spinlock).
Let's mark them as "__sched" so get_wchan which used in trace_sched_blocked_trace
could detect it and skip them. It will produce more meaningful callstack
function like this.
<...>-2867 ( 1068) [002] d.h4 124.209701: sched_blocked_reason: pid=329 iowait=0 caller=worker_thread+0x378/0x470
<...>-2867 ( 1068) [002] d.s3 124.209763: sched_blocked_reason: pid=8454 iowait=1 caller=__filemap_fdatawait_range+0xa0/0x104
<...>-2867 ( 1068) [002] d.s4 124.209803: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470
ScreenDecoratio-2364 ( 1867) [002] d.s3 124.209973: sched_blocked_reason: pid=8454 iowait=1 caller=f2fs_wait_on_page_writeback+0x84/0xcc
ScreenDecoratio-2364 ( 1867) [002] d.s4 124.209986: sched_blocked_reason: pid=869 iowait=0 caller=worker_thread+0x378/0x470
<...>-329 ( 329) [000] d..3 124.210435: sched_blocked_reason: pid=538 iowait=0 caller=worker_thread+0x378/0x470
kworker/u16:13-538 ( 538) [007] d..3 124.210450: sched_blocked_reason: pid=6 iowait=0 caller=worker_thread+0x378/0x470
Test: build pass and boot to home.
Bug: 144961676
Bug: 144713689
Bug: 172212772
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
Change-Id: I9c738802a16941ca767dcc37ae4463070b3fabf4
(cherry picked from commit 1e4de875d9e0cfaccf5131bcc709ae8646cdc168)
Signed-off-by: Will McVicker <willmcvicker@google.com>
Add hooks for vendor specific find_energy_efficient_cpu logic.
Bug: 170507310
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I064b501017e32d4f22f8128bed8bf3a1508ab15b
(cherry picked from commit 2f108e2ec6e89609cbae32c5d13d6ad9f2e858cb)
Signed-off-by: Will McVicker <willmcvicker@google.com>
Below symbols would be used by vendor modules to make better placement
decisions when respective hooks are registered.
1. uclamp_eff_value
2. idle_cpu
Bug: 174219212
Change-Id: I2b41ce9a7c3fb67a8170c5c253985c722f06e85a
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
android_rvh_sched_nohz_balancer_kick hook allows vendor modules to
select the busiest CPU in a group during load balance. When the
load balancer could not pull tasks from this busiest CPU due to
affinity restriction, the CPU is cleared from env->cpu. This must
be passed to the vendor module, otherwise we keep selecting the
exempted CPU as the busiest CPU.
Bug: 174338902
Change-Id: Iedaa389a51849da4c3e094d731fe5e39cd909d81
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.
This patch enables the support. Users can run specified number of threads
to do dma_map_page and dma_unmap_page on a specific NUMA node with the
specified duration. Then dma_map_benchmark will calculate the average
latency for map and unmap.
A difficulity for this benchmark is that dma_map/unmap APIs must run on
a particular device. Each device might have different backend of IOMMU or
non-IOMMU.
So we use the driver_override to bind dma_map_benchmark to a particual
device by:
For platform devices:
echo dma_map_benchmark > /sys/bus/platform/devices/xxx/driver_override
echo xxx > /sys/bus/platform/drivers/xxx/unbind
echo xxx > /sys/bus/platform/drivers/dma_map_benchmark/bind
For PCI devices:
echo dma_map_benchmark > /sys/bus/pci/devices/0000:00:01.0/driver_override
echo 0000:00:01.0 > /sys/bus/pci/drivers/xxx/unbind
echo 0000:00:01.0 > /sys/bus/pci/drivers/dma_map_benchmark/bind
Cc: Will Deacon <will@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
[hch: folded in two fixes from Colin Ian King <colin.king@canonical.com>]
Signed-off-by: Christoph Hellwig <hch@lst.de>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
At the moment we allow bypassing DMA ops only when we can do this for
the entire RAM. However there are configs with mixed type memory
where we could still allow bypassing IOMMU in most cases;
POWERPC with persistent memory is one example.
This adds an arch hook to determine where bypass can still work and
we invoke direct DMA API. The following patch checks the bus limit
on POWERPC to allow or disallow direct mapping.
This adds a ARCH_HAS_DMA_MAP_DIRECT config option to make the arch_xxxx
hooks no-op by default.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Provide a wrapper function to get the IMA hash of an inode. This helper
is useful in fingerprinting files (e.g executables on execution) and
using these fingerprints in detections like an executable unlinking
itself.
Since the ima_inode_hash can sleep, it's only allowed for sleepable
LSM hooks.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201124151210.1081188-3-kpsingh@chromium.org