kernel_xiaomi_cepheus

Evolution-X-Devices/kernel_xiaomi_cepheus

Author	SHA1	Message	Date
Greg Kroah-Hartman	2414f45b93	Merge 4.14.271 into android-4.14-stable Changes in 4.14.271 x86/speculation: Merge one test in spectre_v2_user_select_mitigation() x86,bugs: Unconditionally allow spectre_v2=retpoline,amd x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE x86/speculation: Add eIBRS + Retpoline options Documentation/hw-vuln: Update spectre doc x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting x86/speculation: Use generic retpoline by default on AMD x86/speculation: Update link to AMD speculation whitepaper x86/speculation: Warn about Spectre v2 LFENCE mitigation x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT arm/arm64: Provide a wrapper for SMCCC 1.1 calls arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit() ARM: report Spectre v2 status through sysfs ARM: early traps initialisation ARM: use LOADADDR() to get load address of sections ARM: Spectre-BHB workaround ARM: include unprivileged BPF status in Spectre V2 reporting ARM: fix build error when BPF_SYSCALL is disabled ARM: fix co-processor register typo ARM: Do not use NOCROSSREFS directive with ld.lld ARM: fix build warning in proc-v7-bugs.c xen/xenbus: don't let xenbus_grant_ring() remove grants in error case xen/grant-table: add gnttab_try_end_foreign_access() xen/blkfront: don't use gnttab_query_foreign_access() for mapped status xen/netfront: don't use gnttab_query_foreign_access() for mapped status xen/scsifront: don't use gnttab_query_foreign_access() for mapped status xen/gntalloc: don't use gnttab_query_foreign_access() xen: remove gnttab_query_foreign_access() xen/9p: use alloc/free_pages_exact() xen/gnttab: fix gnttab_end_foreign_access() without page specified xen/netfront: react properly to failing gnttab_end_foreign_access_ref() Linux 4.14.271 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I9f41bca7a4638913cfd2b97006d19185dd9bf584	2022-03-11 11:13:06 +01:00
Josh Poimboeuf	383973dc1a	x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting commit 44a3918c8245ab10c6c9719dd12e7a8d291980d8 upstream. With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> [fllinden@amazon.com: backported to 4.14] Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-03-11 10:13:28 +01:00
Maciej Żenczykowski	34073d7a8e	BACKPORT: bpf: add bpf_ktime_get_boot_ns() On a device like a cellphone which is constantly suspending and resuming CLOCK_MONOTONIC is not particularly useful for keeping track of or reacting to external network events. Instead you want to use CLOCK_BOOTTIME. Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns() based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> (cherry picked from commit 71d19214776e61b33da48f7c1b46e522c7f78221) Change-Id: Ifd62c410dcc5112fd1a473a7e1f70231ca514bc0	2021-02-11 19:46:44 -08:00
Yonghong Song	5179a6a673	UPSTREAM: bpf: permit multiple bpf attachments for a single perf event This patch enables multiple bpf attachments for a kprobe/uprobe/tracepoint single trace event. Each trace_event keeps a list of attached perf events. When an event happens, all attached bpf programs will be executed based on the order of attachment. A global bpf_event_mutex lock is introduced to protect prog_array attaching and detaching. An alternative will be introduce a mutex lock in every trace_event_call structure, but it takes a lot of extra memory. So a global bpf_event_mutex lock is a good compromise. The bpf prog detachment involves allocation of memory. If the allocation fails, a dummy do-nothing program will replace to-be-detached program in-place. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit e87c6bc3852b981e71c757be20771546ce9f76f3) Signed-off-by: Connor O'Brien <connoro@google.com> Bug: 121213201 Bug: 138317270 Test: build & boot cuttlefish; attach 2 progs to 1 tracepoint Change-Id: I25ce1ed6c9512d0a6f2db7547e109958fe1619b6	2019-12-11 20:04:37 -08:00
Alexei Starovoitov	9e61c87b1f	UPSTREAM: bpf: multi program support for cgroup+bpf introduce BPF_F_ALLOW_MULTI flag that can be used to attach multiple bpf programs to a cgroup. The difference between three possible flags for BPF_PROG_ATTACH command: - NONE(default): No further bpf programs allowed in the subtree. - BPF_F_ALLOW_OVERRIDE: If a sub-cgroup installs some bpf program, the program in this cgroup yields to sub-cgroup program. - BPF_F_ALLOW_MULTI: If a sub-cgroup installs some bpf program, that cgroup program gets run in addition to the program in this cgroup. NONE and BPF_F_ALLOW_OVERRIDE existed before. This patch doesn't change their behavior. It only clarifies the semantics in relation to new flag. Only one program is allowed to be attached to a cgroup with NONE or BPF_F_ALLOW_OVERRIDE flag. Multiple programs are allowed to be attached to a cgroup with BPF_F_ALLOW_MULTI flag. They are executed in FIFO order (those that were attached first, run first) The programs of sub-cgroup are executed first, then programs of this cgroup and then programs of parent cgroup. All eligible programs are executed regardless of return code from earlier programs. To allow efficient execution of multiple programs attached to a cgroup and to avoid penalizing cgroups without any programs attached introduce 'struct bpf_prog_array' which is RCU protected array of pointers to bpf programs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> for cgroup bits Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 324bda9e6c5add86ba2e1066476481c48132aca0) Signed-off-by: Connor O'Brien <connoro@google.com> Bug: 121213201 Bug: 138317270 Test: build & boot cuttlefish Change-Id: If17b11a773f73d45ea565a947fc1bf7e158db98d	2019-12-11 20:04:15 -08:00
Greg Kroah-Hartman	fd9e32a025	Merge 4.14.122 into android-4.14 Changes in 4.14.122 net: avoid weird emergency message net/mlx4_core: Change the error print to info print net: test nouarg before dereferencing zerocopy pointers net: usb: qmi_wwan: add Telit 0x1260 and 0x1261 compositions ppp: deflate: Fix possible crash in deflate_init tipc: switch order of device registration to fix a crash vsock/virtio: free packets during the socket release tipc: fix modprobe tipc failed after switch order of device registration vsock/virtio: Initialize core virtio vsock before registering the driver net: Always descend into dsa/ parisc: Export running_on_qemu symbol for modules parisc: Skip registering LED when running in QEMU parisc: Use PA_ASM_LEVEL in boot code parisc: Rename LEVEL to PA_ASM_LEVEL to avoid name clash with DRBD code stm class: Fix channel free in stm output free path md: add mddev->pers to avoid potential NULL pointer dereference intel_th: msu: Fix single mode with IOMMU p54: drop device reference count if fails to enable device of: fix clang -Wunsequenced for be32_to_cpu() cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level() media: ov6650: Fix sensor possibly not detected on probe Revert "cifs: fix memory leak in SMB2_read" NFS4: Fix v4.0 client state corruption when mount PNFS fallback to MDS if no deviceid found clk: hi3660: Mark clk_gate_ufs_subsys as critical clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider clk: rockchip: fix wrong clock definitions for rk3328 fuse: fix writepages on 32bit fuse: honor RLIMIT_FSIZE in fuse_file_fallocate iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114 ceph: flush dirty inodes before proceeding with remount x86_64: Add gap to int3 to allow for call emulation x86_64: Allow breakpoints to emulate call instructions ftrace/x86_64: Emulate call function while updating in breakpoint handler tracing: Fix partial reading of trace event's id file memory: tegra: Fix integer overflow on tick value calculation perf intel-pt: Fix instructions sampling rate perf intel-pt: Fix improved sample timestamp perf intel-pt: Fix sample timestamp wrt non-taken branches objtool: Allow AR to be overridden with HOSTAR fbdev: sm712fb: fix brightness control on reboot, don't set SR30 fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75 fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM fbdev: sm712fb: fix support for 1024x768-16 mode fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken PCI: Mark Atheros AR9462 to avoid bus reset PCI: Factor out pcie_retrain_link() function PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum dm cache metadata: Fix loading discard bitset dm zoned: Fix zone report handling dm delay: fix a crash when invalid device is specified xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module vti4: ipip tunnel deregistration fixes. esp4: add length check for UDP encapsulation xfrm4: Fix uninitialized memory read in _decode_session4 power: supply: cpcap-battery: Fix division by zero securityfs: fix use-after-free on symlink traversal apparmorfs: fix use-after-free on symlink traversal mac80211: Fix kernel panic due to use of txq after free KVM: arm/arm64: Ensure vcpu target is unset on reset failure power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG iwlwifi: mvm: check for length correctness in iwl_mvm_create_skb() sched/cpufreq: Fix kobject memleak x86/mm/mem_encrypt: Disable all instrumentation for early SME setup ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour perf bench numa: Add define for RUSAGE_THREAD if not present Revert "Don't jump to compute_result state from check_result state" md/raid: raid5 preserve the writeback action after the parity check driver core: Postpone DMA tear-down until after devres release for probe failure bpf: add map_lookup_elem_sys_only for lookups from syscall side bpf, lru: avoid messing with eviction heuristics upon syscall lookup btrfs: Honour FITRIM range constraints during free space trim fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough Linux 4.14.122 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2019-05-27 09:36:03 +02:00
Daniel Borkmann	9f2292309d	bpf: add map_lookup_elem_sys_only for lookups from syscall side commit c6110222c6f49ea68169f353565eb865488a8619 upstream. Add a callback map_lookup_elem_sys_only() that map implementations could use over map_lookup_elem() from system call side in case the map implementation needs to handle the latter differently than from the BPF data path. If map_lookup_elem_sys_only() is set, this will be preferred pick for map lookups out of user space. This hook is used in a follow-up fix for LRU map, but once development window opens, we can convert other map types from map_lookup_elem() (here, the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use the callback to simplify and clean up the latter. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-25 18:25:37 +02:00
Al Viro	1c0ac5e9bf	BACKPORT: fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'" Descriptor table is a shared object; it's not a place where you can stick temporary references to files, especially when we don't need an opened file at all. Cc: stable@vger.kernel.org # v4.14 Fixes: `98589a0998` ("netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Chenbo Feng <fengc@google.com> Removed the code related to function bpf_prog_get_ok() since it is not exsit in current android tree. (cherry picked from commit 040ee69226f8a96b7943645d68f41d5d44b5ff7d) Change-Id: If7a602128cdea4b4b50c8effb215c9bca7449515	2019-05-17 15:44:06 -07:00
Greg Kroah-Hartman	6c95b90db5	Merge 4.14.79 into android-4.14 Changes in 4.14.79 xfrm: Validate address prefix lengths in the xfrm selector. xfrm6: call kfree_skb when skb is toobig xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm: reset crypto_done when iterating over multiple input xfrms mac80211: Always report TX status cfg80211: reg: Init wiphy_idx in regulatory_hint_core() mac80211: fix pending queue hang due to TX_DROP cfg80211: Address some corner cases in scan result channel updating mac80211: TDLS: fix skb queue/priority assignment mac80211: fix TX status reporting for ieee80211s xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. ARM: 8799/1: mm: fix pci_ioremap_io() offset check xfrm: validate template mode netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev arm64: hugetlb: Fix handling of young ptes ARM: dts: BCM63xx: Fix incorrect interrupt specifiers net: macb: Clean 64b dma addresses if they are not detected soc: fsl: qbman: qman: avoid allocating from non existing gen_pool soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT mac80211_hwsim: do not omit multicast announce of first added radio Bluetooth: SMP: fix crash in unpairing pxa168fb: prepare the clock qed: Avoid implicit enum conversion in qed_set_tunn_cls_info qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor qed: Avoid constant logical operation warning in qed_vf_pf_acquire qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds asix: Check for supported Wake-on-LAN modes ax88179_178a: Check for supported Wake-on-LAN modes lan78xx: Check for supported Wake-on-LAN modes sr9800: Check for supported Wake-on-LAN modes r8152: Check for supported Wake-on-LAN Modes smsc75xx: Check for Wake-on-LAN modes smsc95xx: Check for Wake-on-LAN modes cfg80211: fix use-after-free in reg_process_hint() perf/core: Fix perf_pmu_unregister() locking perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events net: fec: fix rare tx timeout declance: Fix continuation with the adapter identification message net: qualcomm: rmnet: Skip processing loopback packets locking/ww_mutex: Fix runtime warning in the WW mutex selftest be2net: don't flip hw_features when VXLANs are added/deleted net: cxgb3_main: fix a missing-check bug yam: fix a missing-check bug ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() iwlwifi: mvm: check for short GI only for OFDM iwlwifi: dbg: allow wrt collection before ALIVE iwlwifi: fix the ALIVE notification layout tools/testing/nvdimm: unit test clear-error commands usbip: vhci_hcd: update 'status' file header and format scsi: aacraid: address UBSAN warning regression IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush IB/rxe: put the pool on allocation failure s390/qeth: fix error handling in adapter command callbacks net/mlx5: Fix mlx5_get_vector_affinity function powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n dm integrity: fail early if required HMAC key is not available net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b net: phy: Add general dummy stubs for MMD register access net/mlx5e: Refine ets validation function scsi: qla2xxx: Avoid double completion of abort command kbuild: set no-integrated-as before incl. arch Makefile IB/mlx5: Avoid passing an invalid QP type to firmware ARM: tegra: Fix ULPI regression on Tegra20 l2tp: remove configurable payload offset cifs: Use ULL suffix for 64-bit constant test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches KVM: x86: Update the exit_qualification access bits while walking an address sparc64: Fix regression in pmdp_invalidate(). tpm: move the delay_msec increment after sleep in tpm_transmit() bpf: sockmap, map_release does not hold refcnt for pinned maps tpm: tpm_crb: relinquish locality on error path. xen-netfront: Update features after registering netdev xen-netfront: Fix mismatched rtnl_unlock IB/usnic: Update with bug fixes from core code mmc: dw_mmc-rockchip: correct property names in debug MIPS: Workaround GCC __builtin_unreachable reordering bug lan78xx: Don't reset the interface on open enic: do not overwrite error code iio: buffer: fix the function signature to match implementation selftests/powerpc: Add ptrace hw breakpoint test scsi: ibmvfc: Avoid unnecessary port relogin scsi: sd: Remember that READ CAPACITY(16) succeeded btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf net: phy: phylink: Don't release NULL GPIO x86/paravirt: Fix some warning messages net: stmmac: mark PM functions as __maybe_unused kconfig: fix the rule of mainmenu_stmt symbol libertas: call into generic suspend code before turning off power perf tests: Fix indexing when invoking subtests compiler.h: Allow arch-specific asm/compiler.h ARM: dts: imx53-qsb: disable 1.2GHz OPP perf python: Use -Wno-redundant-decls to build with PYTHON=python3 rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() rxrpc: Only take the rwind and mtu values from latest ACK rxrpc: Fix connection-level abort handling net: ena: fix warning in rmmod caused by double iounmap net: ena: fix NULL dereference due to untimely napi initialization selftests: rtnetlink.sh explicitly requires bash. fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() sch_netem: restore skb->dev after dequeuing from the rbtree mtd: spi-nor: Add support for is25wp series chips kvm: x86: fix WARN due to uninitialized guest FPU state ARM: dts: r8a7790: Correct critical CPU temperature media: uvcvideo: Fix driver reference counting ALSA: usx2y: Fix invalid stream URBs Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing" perf tools: Disable parallelism for 'make clean' drm/i915/gvt: fix memory leak of a cmd_entry struct on error exit path bridge: do not add port to router list when receives query with source 0.0.0.0 net: bridge: remove ipv6 zero address check in mcast queries ipv6: mcast: fix a use-after-free in inet6_mc_check ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called llc: set SOCK_RCU_FREE in llc_sap_add_socket() net: fec: don't dump RX FIFO register when not available net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs net: sched: gred: pass the right attribute to gred_change_table_def() net: socket: fix a missing-check bug net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules net: udp: fix handling of CHECKSUM_COMPLETE packets r8169: fix NAPI handling under high load sctp: fix race on sctp_id2asoc udp6: fix encap return code for resubmitting vhost: Fix Spectre V1 vulnerability virtio_net: avoid using netif_tx_disable() for serializing tx routine ethtool: fix a privilege escalation bug bonding: fix length of actor system ip6_tunnel: Fix encapsulation layout openvswitch: Fix push/pop ethernet validation net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type net: sched: Fix for duplicate class dump net: drop skb on failure in ip_check_defrag() net: fix pskb_trim_rcsum_slow() with odd trim offset net/mlx5e: fix csum adjustments caused by RXFCS rtnetlink: Disallow FDB configuration for non-Ethernet device net: ipmr: fix unresolved entry dumps net: bcmgenet: Poll internal PHY for GENETv5 net/sched: cls_api: add missing validation of netlink attributes net/mlx5: Fix build break when CONFIG_SMP=n Linux 4.14.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-11-08 07:43:01 -08:00
John Fastabend	3c0cff34e9	bpf: sockmap, map_release does not hold refcnt for pinned maps [ Upstream commit ba6b8de423f8d0dee48d6030288ed81c03ddf9f0 ] Relying on map_release hook to decrement the reference counts when a map is removed only works if the map is not being pinned. In the pinned case the ref is decremented immediately and the BPF programs released. After this BPF programs may not be in-use which is not what the user would expect. This patch moves the release logic into bpf_map_put_uref() and brings sockmap in-line with how a similar case is handled in prog array maps. Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2018-11-04 14:52:44 +01:00
Dmitry Shmidt	6bbd2da69c	ANDROID: Remove duplicate security fix Merge resolution problem Change-Id: I763be7d3737179171cab26414dfea8315e5bc655 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2018-01-31 10:11:29 -08:00
Greg Kroah-Hartman	c3a2eda1e3	Merge 4.14.16 into android-4.14 Changes in 4.14.16 orangefs: use list_for_each_entry_safe in purge_waiting_ops orangefs: initialize op on loop restart in orangefs_devreq_read mm, page_alloc: fix potential false positive in __zone_watermark_ok netfilter: nfnetlink_cthelper: Add missing permission checks netfilter: xt_osf: Add missing permission checks xfrm: Fix a race in the xdst pcpu cache. Revert "module: Add retpoline tag to VERMAGIC" Input: xpad - add support for PDP Xbox One controllers Input: trackpoint - force 3 buttons if 0 button is reported Input: trackpoint - only expose supported controls for Elan, ALPS and NXP Btrfs: fix stale entries in readdir KVM: s390: add proper locking for CMMA migration bitmap orangefs: fix deadlock; do not write i_size in read_iter ARM: net: bpf: avoid 'bx' instruction on non-Thumb capable CPUs ARM: net: bpf: fix tail call jumps ARM: net: bpf: fix stack alignment ARM: net: bpf: move stack documentation ARM: net: bpf: correct stack layout documentation ARM: net: bpf: fix register saving ARM: net: bpf: fix LDX instructions ARM: net: bpf: clarify tail_call index drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state() net: Allow neigh contructor functions ability to modify the primary_key ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL ipv6: fix udpv6 sendmsg crash caused by too small MTU ipv6: ip6_make_skb() needs to clear cork.base.dst lan78xx: Fix failure in USB Full Speed net: igmp: fix source address check for IGMPv3 reports net: qdisc_pkt_len_init() should be more robust net: tcp: close sock if net namespace is exiting net/tls: Fix inverted error codes to avoid endless loop net: vrf: Add support for sends to local broadcast address pppoe: take ->needed_headroom of lower device into account on xmit r8169: fix memory corruption on retrieval of hardware statistics. sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf tipc: fix a memory leak in tipc_nl_node_get_link() {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed net/mlx5: Fix get vector affinity helper function ppp: unlock all_ppp_mutex before registering device be2net: restore properly promisc mode after queues reconfiguration ip6_gre: init dev->mtu and dev->hard_header_len correctly gso: validate gso_type in GSO handlers mlxsw: spectrum_router: Don't log an error on missing neighbor tun: fix a memory leak for tfile->tx_array flow_dissector: properly cap thoff field sctp: reinit stream if stream outcnt has been change by sinit in sendmsg netlink: extack needs to be reset each time through loop net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare nfp: use the correct index for link speed table netlink: reset extack earlier in netlink_rcv_skb net/tls: Only attach to sockets in ESTABLISHED state tls: fix sw_ctx leak tls: return -EBUSY if crypto_info is already set tls: reset crypto_info when do_tls_setsockopt_tx fails net: ipv4: Make "ip route get" match iif lo rules again. vmxnet3: repair memory leak perf/x86/amd/power: Do not load AMD power module on !AMD platforms x86/microcode/intel: Extend BDW late-loading further with LLC size check x86/microcode: Fix again accessing initrd after having been freed x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems hrtimer: Reset hrtimer cpu base proper on CPU hotplug bpf: introduce BPF_JIT_ALWAYS_ON config bpf: avoid false sharing of map refcount with max_entries bpf: fix divides by zero bpf: fix 32-bit divide by zero bpf: reject stores into ctx via st and xadd bpf, arm64: fix stack_depth tracking in combination with tail calls cpufreq: governor: Ensure sufficiently large sampling intervals nfsd: auth: Fix gid sorting when rootsquash enabled Linux 4.14.16 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-31 14:23:18 +01:00
Daniel Borkmann	3ea4247ec1	bpf: avoid false sharing of map refcount with max_entries [ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ] In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds speculation") also change the layout of struct bpf_map such that false sharing of fast-path members like max_entries is avoided when the maps reference counter is altered. Therefore enforce them to be placed into separate cachelines. pahole dump after change: struct bpf_map { const struct bpf_map_ops * ops; /* 0 8 / struct bpf_map inner_map_meta; /* 8 8 / void security; /* 16 8 / enum bpf_map_type map_type; / 24 4 / u32 key_size; / 28 4 / u32 value_size; / 32 4 / u32 max_entries; / 36 4 / u32 map_flags; / 40 4 / u32 pages; / 44 4 / u32 id; / 48 4 / int numa_node; / 52 4 / bool unpriv_array; / 56 1 / / XXX 7 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / struct user_struct user; /* 64 8 / atomic_t refcnt; / 72 4 / atomic_t usercnt; / 76 4 / struct work_struct work; / 80 32 / char name[16]; / 112 16 / / --- cacheline 2 boundary (128 bytes) --- / / size: 128, cachelines: 2, members: 17 / / sum members: 121, holes: 1, sum holes: 7 */ }; Now all entries in the first cacheline are read only throughout the life time of the map, set up once during map creation. Overall struct size and number of cachelines doesn't change from the reordering. struct bpf_map is usually first member and embedded in map structs in specific map implementations, so also avoid those members to sit at the end where it could potentially share the cacheline with first map values e.g. in the array since remote CPUs could trigger map updates just as well for those (easily dirtying members like max_entries intentionally as well) while having subsequent values in cache. Quoting from Google's Project Zero blog [1]: Additionally, at least on the Intel machine on which this was tested, bouncing modified cache lines between cores is slow, apparently because the MESI protocol is used for cache coherence [8]. Changing the reference counter of an eBPF array on one physical CPU core causes the cache line containing the reference counter to be bounced over to that CPU core, making reads of the reference counter on all other CPU cores slow until the changed reference counter has been written back to memory. Because the length and the reference counter of an eBPF array are stored in the same cache line, this also means that changing the reference counter on one physical CPU core causes reads of the eBPF array's length to be slow on other physical CPU cores (intentional false sharing). While this doesn't 'control' the out-of-bounds speculation through masking the index as in commit b2157399cc98, triggering a manipulation of the map's reference counter is really trivial, so lets not allow to easily affect max_entries from it. Splitting to separate cachelines also generally makes sense from a performance perspective anyway in that fast-path won't have a cache miss if the map gets pinned, reused in other progs, etc out of control path, thus also avoids unintentional false sharing. [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-31 14:03:50 +01:00
Greg Kroah-Hartman	9b68347c35	Merge 4.14.14 into android-4.14 Changes in 4.14.14 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) KVM: Fix stack-out-of-bounds read in write_mmio can: vxcan: improve handling of missing peer name attribute can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator IB/srpt: Fix ACL lookup during login MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC kvm: vmx: Scrub hardware GPRs at VM-exit platform/x86: wmi: Call acpi_wmi_init() later iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: atomically flush the qp iw_cxgb4: only clear the ARMED bit if a notification is needed iw_cxgb4: reflect the original WR opcode in drain cqes iw_cxgb4: when flushing, complete all wrs in a chain x86/acpi: Handle SCI interrupts above legacy space gracefully ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind phylink: ensure we report link down when LOS asserted RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op net: fec: restore dev_id in the cases of probe error net: fec: defer probe if regulator is not ready net: fec: free/restore resource in related probe error pathes sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled sctp: fix the handling of ICMP Frag Needed for too small MTUs sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only sh_eth: fix SH7757 GEther initialization ipv6: fix possible mem leaks in ipv6_make_skb() ethtool: do not print warning for applications using legacy API mlxsw: spectrum_router: Fix NULL pointer deref net/sched: Fix update of lastuse in act modules implementing stats_update ipv6: sr: fix TLVs not being copied using setsockopt mlxsw: spectrum: Relax sanity checks during enslavement sfp: fix sfp-bus oops when removing socket/upstream membarrier: Disable preemption when calling smp_call_function_many() crypto: algapi - fix NULL dereference in crypto_remove_spawns() mmc: renesas_sdhi: Add MODULE_LICENSE rbd: reacquire lock should update lock owner client id rbd: set max_segments to USHRT_MAX iwlwifi: pcie: fix DMA memory mapping / unmapping x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup KVM: PPC: Book3S PR: Fix WIMG handling under pHyp KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt() drm/vmwgfx: Don't cache framebuffer maps drm/vmwgfx: Potential off by one in vmw_view_add() drm/i915/gvt: Clear the shadow page table entry after post-sync drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. drm/i915: Move init_clock_gating() back to where it was drm/i915: Fix init_clock_gating for resume bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask bpf: arsh is not supported in 32 bit alu thus reject it USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger USB: UDC core: fix double-free in usb_add_gadget_udc_release usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips mux: core: fix double get_device() kdump: write correct address of mem_section into vmcoreinfo apparmor: fix ptrace label match when matching stacked labels e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/pti: Unbreak EFI old_memmap x86/Documentation: Add PTI description x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions x86/tboot: Unbreak tboot with PTI enabled x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*() x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/pti: Make unpoison of pgd for trusted boot work for real objtool: Detect jumps to retpoline thunks objtool: Allow alternatives to be ignored x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit selftests/x86: Add test_vsyscall x86/pti: Fix !PCID and sanitize defines security/Kconfig: Correct the Documentation reference for PTI x86,perf: Disable intel_bts when PTI x86/retpoline: Remove compile time warning Linux 4.14.14 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-17 10:33:24 +01:00
Alexei Starovoitov	a5dbaf8768	bpf: prevent out-of-bounds speculation commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream. Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:45:25 +01:00
Chenbo Feng	02afae7216	BACKPORT: selinux: bpf: Add addtional check for bpf object file receive Introduce a bpf object related check when sending and receiving files through unix domain socket as well as binder. It checks if the receiving process have privilege to read/write the bpf map or use the bpf program. This check is necessary because the bpf maps and programs are using a anonymous inode as their shared inode so the normal way of checking the files and sockets when passing between processes cannot work properly on eBPF object. This check only works when the BPF_SYSCALL is configured. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: I5b2cf4ccb4eab7eda91ddd7091d6aa3e7ed9f2cd (cherry picked from commit f66e448cfda021b0bcd884f26709796fe19c7cc1) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-12-18 21:11:22 +05:30
Chenbo Feng	3479b1366c	BACKPORT: security: bpf: Add LSM hooks for bpf object related syscall Introduce several LSM hooks for the syscalls that will allow the userspace to access to eBPF object such as eBPF programs and eBPF maps. The security check is aimed to enforce a per object security protection for eBPF object so only processes with the right priviliges can read/write to a specific map or use a specific eBPF program. Besides that, a general security hook is added before the multiplexer of bpf syscall to check the cmd and the attribute used for the command. The actual security module can decide which command need to be checked and how the cmd should be checked. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: Ieb3ac74392f531735fc7c949b83346a5f587a77b (cherry picked from commit afdb09c720b62b8090584c11151d856df330e57d) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-12-18 21:11:22 +05:30
Chenbo Feng	cace572e16	BACKPORT: bpf: Add file mode configuration into bpf maps Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b (cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-12-18 21:11:22 +05:30
Shmulik Ladkani	98589a0998	netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' Commit `2c16d60332` ("netfilter: xt_bpf: support ebpf") introduced support for attaching an eBPF object by an fd, with the 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each IPT_SO_SET_REPLACE call. However this breaks subsequent iptables calls: # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT # iptables -A INPUT -s 5.6.7.8 -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. That's because iptables works by loading existing rules using IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with the replacement set. However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number (from the initial "iptables -m bpf" invocation) - so when 2nd invocation occurs, userspace passes a bogus fd number, which leads to 'bpf_mt_check_v1' to fail. One suggested solution [1] was to hack iptables userspace, to perform a "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new, process-local fd per every 'xt_bpf_info_v1' entry seen. However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects. This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given '.fd' and instead perform an in-kernel lookup for the bpf object given the provided '.path'. It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is expected to provide the path of the pinned object. Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved. References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2 [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2 Reported-by: Rafael Buchbinder <rafi@rbk.ms> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-10-09 15:18:04 +02:00
John Fastabend	5a67da2a71	bpf: add support for sockmap detach programs The bpf map sockmap supports adding programs via attach commands. This patch adds the detach command to keep the API symmetric and allow users to remove previously added programs. Otherwise the user would have to delete the map and re-add it to get in this state. This also adds a series of additional tests to capture detach operation and also attaching/detaching invalid prog types. API note: socks will run (or not run) programs depending on the state of the map at the time the sock is added. We do not for example walk the map and remove programs from previously attached socks. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-09-08 21:11:00 -07:00
John Fastabend	464bc0fd62	bpf: convert sockmap field attach_bpf_fd2 to type In the initial sockmap API we provided strparser and verdict programs using a single attach command by extending the attach API with a the attach_bpf_fd2 field. However, if we add other programs in the future we will be adding a field for every new possible type, attach_bpf_fd(3,4,..). This seems a bit clumsy for an API. So lets push the programs using two new type fields. BPF_SK_SKB_STREAM_PARSER BPF_SK_SKB_STREAM_VERDICT This has the advantage of having a readable name and can easily be extended in the future. Updates to samples and sockmap included here also generalize tests slightly to support upcoming patch for multiple map support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: `174a79ff95` ("bpf: sockmap with sk redirect support") Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-28 11:13:21 -07:00
David S. Miller	d6e1e46f69	bpf: linux/bpf.h needs linux/numa.h Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 23:34:03 -07:00
Martin KaFai Lau	96eabe7a40	bpf: Allow selecting numa node during map creation The current map creation API does not allow to provide the numa-node preference. The memory usually comes from where the map-creation-process is running. The performance is not ideal if the bpf_prog is known to always run in a numa node different from the map-creation-process. One of the use case is sharding on CPU to different LRU maps (i.e. an array of LRU maps). Here is the test result of map_perf_test on the INNER_LRU_HASH_PREALLOC test if we force the lru map used by CPU0 to be allocated from a remote numa node: [ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ] ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec 4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec 3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec 6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec 2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec 1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec 7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec 0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<< After specifying numa node: ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec 3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec 1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec 6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec 2:inner_lru_hash_map_perf pre-alloc `1614630` events per sec 4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec 7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec 0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<< This patch adds one field, numa_node, to the bpf_attr. Since numa node 0 is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node field is honored if and only if the BPF_F_NUMA_NODE flag is set. Numa node selection is not supported for percpu map. This patch does not change all the kmalloc. F.e. 'htab = kzalloc()' is not changed since the object is small enough to stay in the cache. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 21:35:43 -07:00
John Fastabend	6bdc9c4c31	bpf: sock_map fixes for !CONFIG_BPF_SYSCALL and !STREAM_PARSER Resolve issues with !CONFIG_BPF_SYSCALL and !STREAM_PARSER net/core/filter.c: In function ‘do_sk_redirect_map’: net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration] sk = __sock_map_lookup_elem(ri->map, ri->ifindex); ^ net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default] sk = __sock_map_lookup_elem(ri->map, ri->ifindex); Fixes: `174a79ff95` ("bpf: sockmap with sk redirect support") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-16 15:34:13 -07:00
John Fastabend	174a79ff95	bpf: sockmap with sk redirect support Recently we added a new map type called dev map used to forward XDP packets between ports (`6093ec2dc3`). This patches introduces a similar notion for sockets. A sockmap allows users to add participating sockets to a map. When sockets are added to the map enough context is stored with the map entry to use the entry with a new helper bpf_sk_redirect_map(map, key, flags) This helper (analogous to bpf_redirect_map in XDP) is given the map and an entry in the map. When called from a sockmap program, discussed below, the skb will be sent on the socket using skb_send_sock(). With the above we need a bpf program to call the helper from that will then implement the send logic. The initial site implemented in this series is the recv_sock hook. For this to work we implemented a map attach command to add attributes to a map. In sockmap we add two programs a parse program and a verdict program. The parse program uses strparser to build messages and pass them to the verdict program. The parse programs use the normal strparser semantics. The verdict program is of type SK_SKB. The verdict program returns a verdict SK_DROP, or SK_REDIRECT for now. Additional actions may be added later. When SK_REDIRECT is returned, expected when bpf program uses bpf_sk_redirect_map(), the sockmap logic will consult per cpu variables set by the helper routine and pull the sock entry out of the sock map. This pattern follows the existing redirect logic in cls and xdp programs. This gives the flow, recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock \ -> kfree_skb As an example use case a message based load balancer may use specific logic in the verdict program to select the sock to send on. Sample programs are provided in future patches that hopefully illustrate the user interfaces. Also selftests are in follow-on patches. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-16 11:27:53 -07:00
John Fastabend	a6f6df69c4	bpf: export bpf_prog_inc_not_zero bpf_prog_inc_not_zero will be used by upcoming sockmap patches this patch simply exports it so we can pull it in. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-16 11:27:53 -07:00
Edward Cree	f1174f77b5	bpf/verifier: rework value tracking Unifies adjusted and unadjusted register value types (e.g. FRAME_POINTER is now just a PTR_TO_STACK with zero offset). Tracks value alignment by means of tracking known & unknown bits. This also replaces the 'reg->imm' (leading zero bits) calculations for (what were) UNKNOWN_VALUEs. If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES, treat the pointer as an unknown scalar and try again, because we might be able to conclude something about the result (e.g. pointer & 0x40 is either 0 or 0x40). Verifier hooks in the netronome/nfp driver were changed to match the new data structures. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-08 17:51:34 -07:00
John Fastabend	46f55cffa4	net: fix build error in devmap helper calls Initial patches missed case with CONFIG_BPF_SYSCALL not set. Fixes: `11393cc9b9` ("xdp: Add batching support to redirect map") Fixes: `97f91a7cf0` ("bpf: add bpf_redirect_map helper routine") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-17 21:58:32 -07:00
John Fastabend	11393cc9b9	xdp: Add batching support to redirect map For performance reasons we want to avoid updating the tail pointer in the driver tx ring as much as possible. To accomplish this we add batching support to the redirect path in XDP. This adds another ndo op "xdp_flush" that is used to inform the driver that it should bump the tail pointer on the TX ring. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-17 09:48:06 -07:00
John Fastabend	97f91a7cf0	bpf: add bpf_redirect_map helper routine BPF programs can use the devmap with a bpf_redirect_map() helper routine to forward packets to netdevice in map. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-17 09:48:06 -07:00
Daniel Borkmann	f96da09473	bpf: simplify narrower ctx access This work tries to make the semantics and code around the narrower ctx access a bit easier to follow. Right now everything is done inside the .is_valid_access(). Offset matching is done differently for read/write types, meaning writes don't support narrower access and thus matching only on offsetof(struct foo, bar) is enough whereas for read case that supports narrower access we must check for offsetof(struct foo, bar) + offsetof(struct foo, bar) + sizeof(<bar>) - 1 for each of the cases. For read cases of individual members that don't support narrower access (like packet pointers or skb->cb[] case which has its own narrow access logic), we check as usual only offsetof(struct foo, bar) like in write case. Then, for the case where narrower access is allowed, we also need to set the aux info for the access. Meaning, ctx_field_size and converted_op_size have to be set. First is the original field size e.g. sizeof(<bar>) as in above example from the user facing ctx, and latter one is the target size after actual rewrite happened, thus for the kernel facing ctx. Also here we need the range match and we need to keep track changing convert_ctx_access() and converted_op_size from is_valid_access() as both are not at the same location. We can simplify the code a bit: check_ctx_access() becomes simpler in that we only store ctx_field_size as a meta data and later in convert_ctx_accesses() we fetch the target_size right from the location where we do convert. Should the verifier be misconfigured we do reject for BPF_WRITE cases or target_size that are not provided. For the subsystems, we always work on ranges in is_valid_access() and add small helpers for ranges and narrow access, convert_ctx_accesses() sets target_size for the relevant instruction. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-07-03 02:22:52 -07:00
Martin KaFai Lau	14dc6f04f4	bpf: Add syscall lookup support for fd array and htab This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-29 13:13:25 -04:00
Yonghong Song	239946314e	bpf: possibly avoid extra masking for narrower load in verifier Commit `31fd85816d` ("bpf: permits narrower load from bpf program context fields") permits narrower load for certain ctx fields. The commit however will already generate a masking even if the prog-specific ctx conversion produces the result with narrower size. For example, for __sk_buff->protocol, the ctx conversion loads the data into register with 2-byte load. A narrower 2-byte load should not generate masking. For __sk_buff->vlan_present, the conversion function set the result as either 0 or 1, essentially a byte. The narrower 2-byte or 1-byte load should not generate masking. To avoid unnecessary masking, prog-specific _is_valid_access now passes converted_op_size back to verifier, which indicates the valid data width after perceived future conversion. Based on this information, verifier is able to avoid unnecessary marking. Since we want more information back from prog-specific _is_valid_access checking, all of them are packed into one data structure for more clarity. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-23 14:04:11 -04:00
Yonghong Song	31fd85816d	bpf: permits narrower load from bpf program context fields Currently, verifier will reject a program if it contains an narrower load from the bpf context structure. For example, __u8 h = __sk_buff->hash, or __u16 p = __sk_buff->protocol __u32 sample_period = bpf_perf_event_data->sample_period which are narrower loads of 4-byte or 8-byte field. This patch solves the issue by: . Introduce a new parameter ctx_field_size to carry the field size of narrower load from prog type specific *__is_valid_access validator back to verifier. . The non-zero ctx_field_size for a memory access indicates (1). underlying prog type specific convert_ctx_accesses supporting non-whole-field access (2). the current insn is a narrower or whole field access. . In verifier, for such loads where load memory size is less than ctx_field_size, verifier transforms it to a full field load followed by proper masking. . Currently, __sk_buff and bpf_perf_event_data->sample_period are supporting narrowing loads. . Narrower stores are still not allowed as typical ctx stores are just normal stores. Because of this change, some tests in verifier will fail and these tests are removed. As a bonus, rename some out of bound __sk_buff->cb access to proper field name and remove two redundant "skb cb oob" tests. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 14:56:25 -04:00
Martin KaFai Lau	f3f1c054c2	bpf: Introduce bpf_map ID This patch generates an unique ID for each created bpf_map. The approach is similar to the earlier patch for bpf_prog ID. It is worth to note that the bpf_map's ID and bpf_prog's ID are in two independent ID spaces and both have the same valid range: [1, INT_MAX). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 15:41:22 -04:00
Martin KaFai Lau	dc4bb0e235	bpf: Introduce bpf_prog ID This patch generates an unique ID for each BPF_PROG_LOAD-ed prog. It is worth to note that each BPF_PROG_LOAD-ed prog will have a different ID even they have the same bpf instructions. The ID is generated by the existing idr_alloc_cyclic(). The ID is ranged from [1, INT_MAX). It is allocated in cyclic manner, so an ID will get reused every 2 billion BPF_PROG_LOAD. The bpf_prog_alloc_id() is done after bpf_prog_select_runtime() because the jit process may have allocated a new prog. Hence, we need to ensure the value of pointer 'prog' will not be changed any more before storing the prog to the prog_idr. After bpf_prog_select_runtime(), the prog is read-only. Hence, the id is stored in 'struct bpf_prog_aux'. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-06 15:41:22 -04:00
Alexei Starovoitov	8726679a0f	bpf: teach verifier to track stack depth teach verifier to track bpf program stack depth Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-31 19:29:47 -04:00
Johannes Berg	40077e0cf6	bpf: remove struct bpf_map_type_list There's no need to have struct bpf_map_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to the types header file added by the previous patch and include it in the same fashion. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-11 14:38:43 -04:00
Johannes Berg	be9370a7d8	bpf: remove struct bpf_prog_type_list There's no need to have struct bpf_prog_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to a new header file and include it in the places needing it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-11 14:38:43 -04:00
Alexei Starovoitov	1cf1cae963	bpf: introduce BPF_PROG_TEST_RUN command development and testing of networking bpf programs is quite cumbersome. Despite availability of user space bpf interpreters the kernel is the ultimate authority and execution environment. Current test frameworks for TC include creation of netns, veth, qdiscs and use of various packet generators just to test functionality of a bpf program. XDP testing is even more complicated, since qemu needs to be started with gro/gso disabled and precise queue configuration, transferring of xdp program from host into guest, attaching to virtio/eth0 and generating traffic from the host while capturing the results from the guest. Moreover analyzing performance bottlenecks in XDP program is impossible in virtio environment, since cost of running the program is tiny comparing to the overhead of virtio packet processing, so performance testing can only be done on physical nic with another server generating traffic. Furthermore ongoing changes to user space control plane of production applications cannot be run on the test servers leaving bpf programs stubbed out for testing. Last but not least, the upstream llvm changes are validated by the bpf backend testsuite which has no ability to test the code generated. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-01 12:45:57 -07:00
Martin KaFai Lau	bcc6b1b7eb	bpf: Add hash of maps support This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Martin KaFai Lau	56f668dfe0	bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Alexei Starovoitov	81ed18ab30	bpf: add helper inlining infra and optimize map_array lookup Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-16 20:44:11 -07:00
Daniel Borkmann	74451e66d5	bpf: make jited programs visible in traces Long standing issue with JITed programs is that stack traces from function tracing check whether a given address is kernel code through {__,}kernel_text_address(), which checks for code in core kernel, modules and dynamically allocated ftrace trampolines. But what is still missing is BPF JITed programs (interpreted programs are not an issue as __bpf_prog_run() will be attributed to them), thus when a stack trace is triggered, the code walking the stack won't see any of the JITed ones. The same for address correlation done from user space via reading /proc/kallsyms. This is read by tools like perf, but the latter is also useful for permanent live tracing with eBPF itself in combination with stack maps when other eBPF types are part of the callchain. See offwaketime example on dumping stack from a map. This work tries to tackle that issue by making the addresses and symbols known to the kernel. The lookup from *kernel_text_address() is implemented through a latched RB tree that can be read under RCU in fast-path that is also shared for symbol/size/offset lookup for a specific given address in kallsyms. The slow-path iteration through all symbols in the seq file done via RCU list, which holds a tiny fraction of all exported ksyms, usually below 0.1 percent. Function symbols are exported as bpf_prog_<tag>, in order to aide debugging and attribution. This facility is currently enabled for root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening is active in any mode. The rationale behind this is that still a lot of systems ship with world read permissions on kallsyms thus addresses should not get suddenly exposed for them. If that situation gets much better in future, we always have the option to change the default on this. Likewise, unprivileged programs are not allowed to add entries there either, but that is less of a concern as most such programs types relevant in this context are for root-only anyway. If enabled, call graphs and stack traces will then show a correct attribution; one example is illustrated below, where the trace is now visible in tooling such as perf script --kallsyms=/proc/kallsyms and friends. Before: 7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so) After: 7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux) [...] 7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-17 13:40:05 -05:00
David S. Miller	4e8f2fc1a5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-28 10:33:06 -05:00
Daniel Borkmann	d407bd25a2	bpf: don't trigger OOM killer under pressure with map alloc This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(), that are to be used for map allocations. Using kmalloc() for very large allocations can cause excessive work within the page allocator, so i) fall back earlier to vmalloc() when the attempt is considered costly anyway, and even more importantly ii) don't trigger OOM killer with any of the allocators. Since this is based on a user space request, for example, when creating maps with element pre-allocation, we really want such requests to fail instead of killing other user space processes. Also, don't spam the kernel log with warnings should any of the allocations fail under pressure. Given that, we can make backend selection in bpf_map_area_alloc() generic, and convert all maps over to use this API for spots with potentially large allocation requests. Note, replacing the one kmalloc_array() is fine as overflow checks happen earlier in htab_map_alloc(), since it must also protect the multiplication for vmalloc() should kmalloc_array() fail. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 17:12:26 -05:00
David S. Miller	580bdf5650	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-01-17 15:19:37 -05:00
Daniel Borkmann	f1f7714ea5	bpf: rework prog_digest into prog_tag Commit `7bd509e311` ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: `7bd509e311` ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-16 14:03:31 -05:00
Daniel Borkmann	6b8cc1d11e	bpf: pass original insn directly to convert_ctx_access Currently, when calling convert_ctx_access() callback for the various program types, we pass in insn->dst_reg, insn->src_reg, insn->off from the original instruction. This information is needed to rewrite the instruction that is based on the user ctx structure into a kernel representation for the ctx. As we'd like to allow access size beyond just BPF_W, we'd need also insn->code for that in order to decode the original access size. Given that, lets just pass insn directly to the convert_ctx_access() callback and work on that to not clutter the callback with even more arguments we need to pass when everything is already contained in insn. So lets go through that once, no functional change. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-12 10:00:31 -05:00
Alexei Starovoitov	39f19ebbf5	bpf: rename ARG_PTR_TO_STACK since ARG_PTR_TO_STACK is no longer just pointer to stack rename it to ARG_PTR_TO_MEM and adjust comment. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-09 16:56:27 -05:00

1 2 3

121 Commits