Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.
This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.
Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).
The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)
[1] https://github.com/borkmann/llvm/tree/bpf-insns
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Change-Id: I70bf3c680ba5b0d5631bc2b0b906e9e5ca219492
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Diab Neiroukh <lazerl0rd@thezest.dev>
Signed-off-by: DennySPb <dennyspb@gmail.com>
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active.
Verified with test program at: https://lore.kernel.org/patchwork/patch/1008117/
Backport link: https://lore.kernel.org/patchwork/patch/1014892/
Bug: 113362644
Change-Id: I9b71aa6bb677004c1cbdbe1a8907f5c756148b20
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Changes in 4.9.106
objtool: Improve detection of BUG() and other dead ends
objtool: Move checking code to check.c
tools lib: Add for_each_clear_bit macro
tools: add more bitmap functions
tools: enable endian checks for all sparse builds
tools include: Introduce linux/compiler-gcc.h
radix tree test suite: Remove types.h
tools include: Adopt __compiletime_error
tools include: Introduce atomic_cmpxchg_{relaxed,release}()
tools include: Add UINT_MAX def to kernel.h
tools include: Adopt kernel's refcount.h
perf tools: Force fixdep compilation at the start of the build
perf tools: Move headers check into bash script
tools include uapi: Grab copies of stat.h and fcntl.h
tools include: Introduce linux/bug.h, from the kernel sources
tools include: Adopt __same_type() and __must_be_array() from the kernel
tools include: Move ARRAY_SIZE() to linux/kernel.h
tools include: Drop ARRAY_SIZE() definition from linux/hashtable.h
tools include: Include missing headers for fls() and types in linux/log2.h
objtool: sync up with the 4.14.47 version of objtool
objtool: Support GCC 8's cold subfunctions
objtool: Support GCC 8 switch tables
objtool: Detect RIP-relative switch table references
objtool: Detect RIP-relative switch table references, part 2
objtool: Fix "noreturn" detection for recursive sibling calls
objtool, x86: Add several functions and files to the objtool whitelist
perf/tools: header file sync up
objtool: header file sync-up
x86/xen: Add unwind hint annotations to xen_setup_gdt
objtool: Enclose contents of unreachable() macro in a block
Linux 4.9.106
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
When building tools/perf/ it rightly complains about a number of .h
files being out of sync. Fix this up by syncing them properly with the
relevant in-kernel versions.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cherry-pick from commit 6acc5c2910689fc6ee181bf63085c5efff6a42bd
Returns the owner uid of the socket inside a sk_buff. This is useful to
perform per-UID accounting of network traffic or per-UID packet
filtering. The socket need to be a fullsock otherwise overflowuid is
returned.
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: Idc00947ccfdd4e9f2214ffc4178d701cd9ead0ac
Cherrypick from commit: 91b8270f2a4d1d9b268de90451cdca63a70052d6
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 30950746
Change-Id: I95918dcc3ceffb3061495a859d28aee88e3cde3c
Update the missing bpf helper function name in bpf_func_id to keep the
uapi header consistent with upstream uapi header because we need the
new added bpf helper function bpf get_socket_cookie and get_socket_uid.
The patch related to those headers are not backetported since they are
not related and backport them will bring in extra confilict.
Signed-off-by: Chenbo Feng <fengc@google.com>
Bug: 30950746
Change-Id: I2b5fd03799ac5f2e3243ab11a1bccb932f06c312
(cherry picked from commit 651be3cb085341a21847e47c694c249c3e1e4e5b)
We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can
support other length as well, then user may watch more data with less
number of watchpoints (provided hardware supports it). For example: if we
have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we
will have to use two slots to implement it currently. One slot will watch a
half word at offset 4 and other a byte at offset 6. If we can have a
watchpoint of length 3 then we can watch it with single slot as well.
ARM64 hardware does support such functionality, therefore adding these new
definitions in generic layer.
Signed-off-by: Pratyush Anand <panand@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pavel Labath <labath@google.com>
Change-Id: Ie17ed89ca526e4fddf591bb4e556fdfb55fc2eac
Bug: 30919905
Some mmap related macros have different values for different
architectures. This patch introduces uapi mman.h for each
architectures.
Three headers are cloned from kernel include to tools/include:
tools/include/uapi/asm-generic/mman-common.h
tools/include/uapi/asm-generic/mman.h
tools/include/uapi/linux/mman.h
The main part of this patch is generated by following script:
macros=`cat $0 | awk 'V==1 {print}; /^# start macro list/ {V=1}'`
for arch in `ls tools/arch`
do
[ -d tools/arch/$arch/include/uapi/asm ] || mkdir -p tools/arch/$arch/include/uapi/asm
src=arch/$arch/include/uapi/asm/mman.h
target=tools/arch/$arch/include/uapi/asm/mman.h
guard="TOOLS_ARCH_"`echo $arch | awk '{print toupper($0)}'`_UAPI_ASM_MMAN_FIX_H
echo '#ifndef '$guard > $target
echo '#define '$guard >> $target
[ -f $src ] &&
for m in $macros
do
if grep '#define[ \t]*'$m $src > /dev/null 2>&1
then
grep -h '#define[ \t]*'$m $src | sed 's/[ \t]*\/\*.*$//g' >> $target
fi
done
if [ -f $src ]
then
grep '#include <asm-generic' $src >> $target
else
echo "#include <asm-generic/mman.h>" >> $target
fi
echo '#endif' >> $target
echo "$target"
done
exit 0
# Following macros are extracted from:
# tools/perf/trace/beauty/mmap.c
#
# start macro list
MADV_DODUMP
MADV_DOFORK
MADV_DONTDUMP
MADV_DONTFORK
MADV_DONTNEED
MADV_HUGEPAGE
MADV_HWPOISON
MADV_MERGEABLE
MADV_NOHUGEPAGE
MADV_NORMAL
MADV_RANDOM
MADV_REMOVE
MADV_SEQUENTIAL
MADV_SOFT_OFFLINE
MADV_UNMERGEABLE
MADV_WILLNEED
MAP_32BIT
MAP_ANONYMOUS
MAP_DENYWRITE
MAP_EXECUTABLE
MAP_FILE
MAP_FIXED
MAP_GROWSDOWN
MAP_HUGETLB
MAP_LOCKED
MAP_NONBLOCK
MAP_NORESERVE
MAP_POPULATE
MAP_PRIVATE
MAP_SHARED
MAP_STACK
MAP_UNINITIALIZED
MREMAP_FIXED
MREMAP_MAYMOVE
PROT_EXEC
PROT_GROWSDOWN
PROT_GROWSUP
PROT_NONE
PROT_READ
PROT_SEM
PROT_WRITE
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1473684871-209320-2-git-send-email-wangnan0@huawei.com
[ Added new files to tools/perf/MANIFEST to fix the detached tarball build, add mman.h for ARC ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The way we're using kernel headers in tools/ now, with a copy that is
made to the same path prefixed by "tools/" plus checking if that copy
got stale, i.e. if the kernel counterpart changed, helps in keeping
track with new features that may be useful for tools to exploit.
For instance, looking at all the changes to bpf.h since it was last
copied to tools/include brings this to toolers' attention:
Need to investigate this one to check how to run a program via perf, setting up
a BPF event, that will take advantage of the way perf already calls clang/LLVM,
sets up the event and runs the workload in a single command line, helping in
debugging such semi cooperative programs:
96ae522795 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
This one needs further investigation about using the feature it improves
in 'perf trace' to do some tcpdumpin' mixed with syscalls, tracepoints,
probe points, callgraphs, etc:
555c8a8623 ("bpf: avoid stack copy and use skb ctx for event output")
Add tracing just packets that are related to some container to that mix:
4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
4ed8ec521e ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY")
Definetely needs to have example programs accessing task_struct from a bpf proggie
started from 'perf trace':
606274c5ab ("bpf: introduce bpf_get_current_task() helper")
Core networking related, XDP:
6ce96ca348 ("bpf: add XDP_TX xdp_action for direct forwarding")
6a773a15a1 ("bpf: add XDP prog type for early driver filter")
13c5c240f7 ("bpf: add bpf_get_hash_recalc helper")
d2485c4242 ("bpf: add bpf_skb_change_type helper")
6578171a7f ("bpf: add bpf_skb_change_proto helper")
Changes detected by the tools build system:
$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory '/home/acme/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
Warning: tools/include/uapi/linux/bpf.h differs from kernel
INSTALL GTK UI
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
<SNIP>
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-difq4ts1xvww6eyfs9e7zlft@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We shouldn't use headers from the kernel sources directly, instead we
should use the system's headers or in cases where that isn't possible,
like with perf_event.h, where the introduction of kernel features such
as perf_event_attr.{write_backwards,sample_max_stack} and
PERF_EVENT_IOC_PAUSE_OUTPUT take some time to become available in
/usr/include/linux/perf_event.h we need a copy.
Do it and check for source code drift, emitting a warning when changes
are detected.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-v6aks5un3s5pehory6f42nrl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>