24 Commits

Author SHA1 Message Date
Toke Høiland-Jørgensen
5775596fff BACKPORT: devmap: Allow map lookups from eBPF
We don't currently allow lookups into a devmap from eBPF, because the map
lookup returns a pointer directly to the dev->ifindex, which shouldn't be
modifiable from eBPF.

However, being able to do lookups in devmaps is useful to know (e.g.)
whether forwarding to a specific interface is enabled. Currently, programs
work around this by keeping a shadow map of another type which indicates
whether a map index is valid.

Since we now have a flag to make maps read-only from the eBPF side, we can
simply lift the lookup restriction if we make sure this flag is always set.

Change-Id: I42b1430605c6837710fd903a0c8abf2c7dc13f16
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2024-03-21 18:31:04 +00:00
Toke Høiland-Jørgensen
d7c3922002 BACKPORT: xdp: Add devmap_hash map type for looking up devices by hashed index
A common pattern when using xdp_redirect_map() is to create a device map
where the lookup key is simply ifindex. Because device maps are arrays,
this leaves holes in the map, and the map has to be sized to fit the
largest ifindex, regardless of how many devices actually are actually
needed in the map.

This patch adds a second type of device map where the key is looked up
using a hashmap, instead of being used as an array index. This allows maps
to be densely packed, so they can be smaller.

Change-Id: I6155de499a47fb45bac1a39319f0ad979032fd6d
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-21 18:31:04 +00:00
Tim Zimmermann
331e04f71e kernel: bpf: devmap: Create __dev_map_alloc_node
* Like in fca16e5107

Change-Id: I95915b56f381c8f5851b6e7827eed6064aefe586
2024-03-21 18:31:04 +00:00
Greg Kroah-Hartman
13855a652b Merge 4.14.157 into android-4.14
Changes in 4.14.157
	net/mlx4_en: fix mlx4 ethtool -N insertion
	net: rtnetlink: prevent underflows in do_setvfinfo()
	sfc: Only cancel the PPS workqueue if it exists
	net/mlx5e: Fix set vf link state error flow
	net/mlxfw: Verify FSM error code translation doesn't exceed array size
	net/sched: act_pedit: fix WARN() in the traffic path
	vhost/vsock: split packets to send using multiple buffers
	gpio: max77620: Fixup debounce delays
	tools: gpio: Correctly add make dependencies for gpio_utils
	nbd:fix memory leak in nbd_get_socket()
	virtio_console: allocate inbufs in add_port() only if it is needed
	Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()"
	mm/ksm.c: don't WARN if page is still mapped in remove_stable_node()
	drm/i915/userptr: Try to acquire the page lock around set_page_dirty()
	platform/x86: asus-nb-wmi: Support ALS on the Zenbook UX430UQ
	platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
	mwifiex: Fix NL80211_TX_POWER_LIMITED
	ALSA: isight: fix leak of reference to firewire unit in error path of .probe callback
	printk: fix integer overflow in setup_log_buf()
	gfs2: Fix marking bitmaps non-full
	pty: fix compat ioctls
	synclink_gt(): fix compat_ioctl()
	powerpc: Fix signedness bug in update_flash_db()
	powerpc/boot: Disable vector instructions
	powerpc/eeh: Fix use of EEH_PE_KEEP on wrong field
	EDAC, thunderx: Fix memory leak in thunderx_l2c_threaded_isr()
	brcmsmac: AP mode: update beacon when TIM changes
	ath10k: allocate small size dma memory in ath10k_pci_diag_write_mem
	skd: fixup usage of legacy IO API
	cdrom: don't attempt to fiddle with cdo->capability
	spi: sh-msiof: fix deferred probing
	mmc: mediatek: fix cannot receive new request when msdc_cmd_is_ready fail
	btrfs: handle error of get_old_root
	gsmi: Fix bug in append_to_eventlog sysfs handler
	misc: mic: fix a DMA pool free failure
	w1: IAD Register is yet readable trough iad sys file. Fix snprintf (%u for unsigned, count for max size).
	m68k: fix command-line parsing when passed from u-boot
	RDMA/bnxt_re: Fix qp async event reporting
	pinctrl: sunxi: Fix a memory leak in 'sunxi_pinctrl_build_state()'
	pwm: lpss: Only set update bit if we are actually changing the settings
	amiflop: clean up on errors during setup
	qed: Align local and global PTT to propagate through the APIs.
	scsi: ips: fix missing break in switch
	KVM: nVMX: reset cache/shadows when switching loaded VMCS
	KVM/x86: Fix invvpid and invept register operand size in 64-bit mode
	scsi: isci: Use proper enumerated type in atapi_d2h_reg_frame_handler
	scsi: isci: Change sci_controller_start_task's return type to sci_status
	scsi: iscsi_tcp: Explicitly cast param in iscsi_sw_tcp_host_get_param
	crypto: ccree - avoid implicit enum conversion
	nvmet-fcloop: suppress a compiler warning
	clk: mmp2: fix the clock id for sdh2_clk and sdh3_clk
	clk: at91: audio-pll: fix audio pmc type
	ASoC: tegra_sgtl5000: fix device_node refcounting
	scsi: dc395x: fix dma API usage in srb_done
	scsi: dc395x: fix DMA API usage in sg_update_list
	net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed
	net: fix warning in af_unix
	net: ena: Fix Kconfig dependency on X86
	xfs: fix use-after-free race in xfs_buf_rele
	kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
	PM / Domains: Deal with multiple states but no governor in genpd
	ALSA: i2c/cs8427: Fix int to char conversion
	macintosh/windfarm_smu_sat: Fix debug output
	PCI: vmd: Detach resources after stopping root bus
	USB: misc: appledisplay: fix backlight update_status return code
	usbip: tools: fix atoi() on non-null terminated string
	dm raid: avoid bitmap with raid4/5/6 journal device
	SUNRPC: Fix a compile warning for cmpxchg64()
	sunrpc: safely reallow resvport min/max inversion
	atm: zatm: Fix empty body Clang warnings
	s390/perf: Return error when debug_register fails
	spi: omap2-mcspi: Set FIFO DMA trigger level to word length
	sparc: Fix parport build warnings.
	powerpc/pseries: Export raw per-CPU VPA data via debugfs
	ceph: fix dentry leak in ceph_readdir_prepopulate
	rtc: s35390a: Change buf's type to u8 in s35390a_init
	f2fs: fix to spread clear_cold_data()
	mISDN: Fix type of switch control variable in ctrl_teimanager
	qlcnic: fix a return in qlcnic_dcb_get_capability()
	net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
	mfd: arizona: Correct calling of runtime_put_sync
	mfd: mc13xxx-core: Fix PMIC shutdown when reading ADC values
	mfd: intel_soc_pmic_bxtwc: Chain power button IRQs as well
	mfd: max8997: Enale irq-wakeup unconditionally
	selftests/ftrace: Fix to test kprobe $comm arg only if available
	selftests: watchdog: fix message when /dev/watchdog open fails
	selftests: watchdog: Fix error message.
	thermal: rcar_thermal: Prevent hardware access during system suspend
	bpf: devmap: fix wrong interface selection in notifier_call
	powerpc/process: Fix flush_all_to_thread for SPE
	sparc64: Rework xchg() definition to avoid warnings.
	arm64: lib: use C string functions with KASAN enabled
	fs/ocfs2/dlm/dlmdebug.c: fix a sleep-in-atomic-context bug in dlm_print_one_mle()
	mm/page-writeback.c: fix range_cyclic writeback vs writepages deadlock
	macsec: update operstate when lower device changes
	macsec: let the administrator set UP state even if lowerdev is down
	block: fix the DISCARD request merge
	i2c: uniphier-f: make driver robust against concurrency
	i2c: uniphier-f: fix occasional timeout error
	i2c: uniphier-f: fix race condition when IRQ is cleared
	um: Make line/tty semantics use true write IRQ
	vfs: avoid problematic remapping requests into partial EOF block
	powerpc/xmon: Relax frame size for clang
	selftests/powerpc/signal: Fix out-of-tree build
	selftests/powerpc/switch_endian: Fix out-of-tree build
	selftests/powerpc/cache_shape: Fix out-of-tree build
	linux/bitmap.h: handle constant zero-size bitmaps correctly
	linux/bitmap.h: fix type of nbits in bitmap_shift_right()
	hfsplus: fix BUG on bnode parent update
	hfs: fix BUG on bnode parent update
	hfsplus: prevent btree data loss on ENOSPC
	hfs: prevent btree data loss on ENOSPC
	hfsplus: fix return value of hfsplus_get_block()
	hfs: fix return value of hfs_get_block()
	hfsplus: update timestamps on truncate()
	hfs: update timestamp on truncate()
	fs/hfs/extent.c: fix array out of bounds read of array extent
	mm/memory_hotplug: make add_memory() take the device_hotplug_lock
	igb: shorten maximum PHC timecounter update interval
	net: hns3: bugfix for buffer not free problem during resetting
	ntb_netdev: fix sleep time mismatch
	ntb: intel: fix return value for ndev_vec_mask()
	arm64: makefile fix build of .i file in external module case
	ocfs2: don't put and assigning null to bh allocated outside
	ocfs2: fix clusters leak in ocfs2_defrag_extent()
	net: do not abort bulk send on BQL status
	sched/topology: Fix off by one bug
	sched/fair: Don't increase sd->balance_interval on newidle balance
	openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS
	clk: sunxi-ng: enable so-said LDOs for A64 SoC's pll-mipi clock
	audit: print empty EXECVE args
	btrfs: avoid link error with CONFIG_NO_AUTO_INLINE
	wil6210: fix locking in wmi_call
	wlcore: Fix the return value in case of error in 'wlcore_vendor_cmd_smart_config_start()'
	rtl8xxxu: Fix missing break in switch
	brcmsmac: never log "tid x is not agg'able" by default
	wireless: airo: potential buffer overflow in sprintf()
	rtlwifi: rtl8192de: Fix misleading REG_MCUFWDL information
	net: dsa: bcm_sf2: Turn on PHY to allow successful registration
	scsi: mpt3sas: Fix Sync cache command failure during driver unload
	scsi: mpt3sas: Don't modify EEDPTagMode field setting on SAS3.5 HBA devices
	scsi: mpt3sas: Fix driver modifying persistent data in Manufacturing page11
	scsi: megaraid_sas: Fix msleep granularity
	scsi: megaraid_sas: Fix goto labels in error handling
	scsi: lpfc: fcoe: Fix link down issue after 1000+ link bounces
	scsi: lpfc: Correct loss of fc4 type on remote port address change
	dlm: fix invalid free
	dlm: don't leak kernel pointer to userspace
	vrf: mark skb for multicast or link-local as enslaved to VRF
	ACPICA: Use %d for signed int print formatting instead of %u
	net: bcmgenet: return correct value 'ret' from bcmgenet_power_down
	of: unittest: allow base devicetree to have symbol metadata
	cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces
	pinctrl: qcom: spmi-gpio: fix gpio-hog related boot issues
	pinctrl: lpc18xx: Use define directive for PIN_CONFIG_GPIO_PIN_INT
	pinctrl: zynq: Use define directive for PIN_CONFIG_IO_STANDARD
	PCI: keystone: Use quirk to limit MRRS for K2G
	spi: omap2-mcspi: Fix DMA and FIFO event trigger size mismatch
	i2c: uniphier-f: fix timeout error after reading 8 bytes
	mm/memory_hotplug: Do not unlock when fails to take the device_hotplug_lock
	ipv6: Fix handling of LLA with VRF and sockets bound to VRF
	cfg80211: call disconnect_wk when AP stops
	Bluetooth: Fix invalid-free in bcsp_close()
	KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved
	ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe
	ath9k_hw: fix uninitialized variable data
	md/raid10: prevent access of uninitialized resync_pages offset
	mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span()
	net: phy: dp83867: fix speed 10 in sgmii mode
	net: phy: dp83867: increase SGMII autoneg timer duration
	arm64: fix for bad_mode() handler to always result in panic
	cpufreq: Skip cpufreq resume if it's not suspended
	ocfs2: remove ocfs2_is_o2cb_active()
	ARM: 8904/1: skip nomap memblocks while finding the lowmem/highmem boundary
	ARC: perf: Accommodate big-endian CPU
	x86/insn: Fix awk regexp warnings
	x86/speculation: Fix incorrect MDS/TAA mitigation status
	x86/speculation: Fix redundant MDS mitigation message
	nbd: prevent memory leak
	nfc: port100: handle command failure cleanly
	media: vivid: Set vid_cap_streaming and vid_out_streaming to true
	media: vivid: Fix wrong locking that causes race conditions on streaming stop
	media: usbvision: Fix races among open, close, and disconnect
	cpufreq: Add NULL checks to show() and store() methods of cpufreq
	media: uvcvideo: Fix error path in control parsing failure
	media: b2c2-flexcop-usb: add sanity checking
	media: cxusb: detect cxusb_ctrl_msg error in query
	media: imon: invalid dereference in imon_touch_event
	virtio_ring: fix return code on DMA mapping fails
	usbip: tools: fix fd leakage in the function of read_attr_usbip_status
	usbip: Fix uninitialized symbol 'nents' in stub_recv_cmd_submit()
	usb-serial: cp201x: support Mark-10 digital force gauge
	USB: chaoskey: fix error case of a timeout
	appledisplay: fix error handling in the scheduled work
	USB: serial: mos7840: add USB ID to support Moxa UPort 2210
	USB: serial: mos7720: fix remote wakeup
	USB: serial: mos7840: fix remote wakeup
	USB: serial: option: add support for DW5821e with eSIM support
	USB: serial: option: add support for Foxconn T77W968 LTE modules
	staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error
	powerpc/64s: support nospectre_v2 cmdline option
	powerpc/book3s64: Fix link stack flush on context switch
	KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
	x86/hyperv: mark hyperv_init as __init function
	Linux 4.14.157

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-12-01 09:36:51 +01:00
Taehee Yoo
8ba716e816 bpf: devmap: fix wrong interface selection in notifier_call
[ Upstream commit f592f804831f1cf9d1f9966f58c80f150e6829b5 ]

The dev_map_notification() removes interface in devmap if
unregistering interface's ifindex is same.
But only checking ifindex is not enough because other netns can have
same ifindex. so that wrong interface selection could occurred.
Hence netdev pointer comparison code is added.

v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
v1: Initial patch

Fixes: 2ddf71e23c ("net: add notifier hooks for devmap bpf map")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-01 09:13:47 +01:00
Greg Kroah-Hartman
acd501fffb Merge 4.14.123 into android-4.14
Changes in 4.14.123
	x86: Hide the int3_emulate_call/jmp functions from UML
	ext4: do not delete unlinked inode from orphan list on failed truncate
	f2fs: Fix use of number of devices
	KVM: x86: fix return value for reserved EFER
	bio: fix improper use of smp_mb__before_atomic()
	sbitmap: fix improper use of smp_mb__before_atomic()
	Revert "scsi: sd: Keep disk read-only when re-reading partition"
	crypto: vmx - CTR: always increment IV as quadword
	mmc: sdhci-iproc: cygnus: Set NO_HISPD bit to fix HS50 data hold time problem
	mmc: sdhci-iproc: Set NO_HISPD bit to fix HS50 data hold time problem
	kvm: svm/avic: fix off-by-one in checking host APIC ID
	libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead
	libnvdimm/namespace: Fix label tracking error
	arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable
	gfs2: Fix sign extension bug in gfs2_update_stats
	Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path
	Btrfs: avoid fallback to transaction commit during fsync of files with holes
	Btrfs: fix race between ranged fsync and writeback of adjacent ranges
	btrfs: sysfs: Fix error path kobject memory leak
	btrfs: sysfs: don't leak memory when failing add fsid
	fbdev: fix divide error in fb_var_to_videomode
	hugetlb: use same fault hash key for shared and private mappings
	brcmfmac: assure SSID length from firmware is limited
	brcmfmac: add subtype check for event handling in data path
	btrfs: honor path->skip_locking in backref code
	fbdev: fix WARNING in __alloc_pages_nodemask bug
	media: cpia2: Fix use-after-free in cpia2_exit
	media: serial_ir: Fix use-after-free in serial_ir_init_module
	media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
	ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
	bpf: devmap: fix use-after-free Read in __dev_map_entry_free
	batman-adv: mcast: fix multicast tt/tvlv worker locking
	at76c50x-usb: Don't register led_trigger if usb_register_driver failed
	net: erspan: fix use-after-free
	Revert "btrfs: Honour FITRIM range constraints during free space trim"
	gfs2: Fix lru_count going negative
	cxgb4: Fix error path in cxgb4_init_module
	NFS: make nfs_match_client killable
	IB/hfi1: Fix WQ_MEM_RECLAIM warning
	gfs2: Fix occasional glock use-after-free
	mmc: core: Verify SD bus width
	tools/bpf: fix perf build error with uClibc (seen on ARC)
	dmaengine: tegra210-dma: free dma controller in remove()
	net: ena: gcc 8: fix compilation warning
	pinctrl: zte: fix leaked of_node references
	ASoC: hdmi-codec: unlock the device on startup errors
	powerpc/perf: Return accordingly on invalid chip-id in
	powerpc/boot: Fix missing check of lseek() return value
	ASoC: imx: fix fiq dependencies
	spi: pxa2xx: fix SCR (divisor) calculation
	brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
	ACPI / property: fix handling of data_nodes in acpi_get_next_subnode()
	ARM: vdso: Remove dependency with the arch_timer driver internals
	arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
	sched/cpufreq: Fix kobject memleak
	scsi: qla2xxx: Fix a qla24xx_enable_msix() error path
	scsi: qla2xxx: Fix abort handling in tcm_qla2xxx_write_pending()
	scsi: qla2xxx: Avoid that lockdep complains about unsafe locking in tcm_qla2xxx_close_session()
	Btrfs: fix data bytes_may_use underflow with fallocate due to failed quota reserve
	btrfs: fix panic during relocation after ENOSPC before writeback happens
	btrfs: Don't panic when we can't find a root key
	iwlwifi: pcie: don't crash on invalid RX interrupt
	rtc: 88pm860x: prevent use-after-free on device remove
	scsi: qedi: Abort ep termination if offload not scheduled
	w1: fix the resume command API
	dmaengine: pl330: _stop: clear interrupt status
	mac80211/cfg80211: update bss channel on channel switch
	libbpf: fix samples/bpf build failure due to undefined UINT32_MAX
	ASoC: fsl_sai: Update is_slave_mode with correct value
	mwifiex: prevent an array overflow
	net: cw1200: fix a NULL pointer dereference
	crypto: sun4i-ss - Fix invalid calculation of hash end
	bcache: return error immediately in bch_journal_replay()
	bcache: fix failure in journal relplay
	bcache: add failure check to run_cache_set() for journal replay
	bcache: avoid clang -Wunintialized warning
	vfio-ccw: Do not call flush_workqueue while holding the spinlock
	vfio-ccw: Release any channel program when releasing/removing vfio-ccw mdev
	x86/build: Move _etext to actual end of .text
	smpboot: Place the __percpu annotation correctly
	x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
	mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
	HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
	pinctrl: pistachio: fix leaked of_node references
	pinctrl: samsung: fix leaked of_node references
	clk: rockchip: undo several noc and special clocks as critical on rk3288
	dmaengine: at_xdmac: remove BUG_ON macro in tasklet
	media: coda: clear error return value before picture run
	media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
	media: au0828: stop video streaming only when last user stops
	media: ov2659: make S_FMT succeed even if requested format doesn't match
	audit: fix a memory leak bug
	media: stm32-dcmi: fix crash when subdev do not expose any formats
	media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
	media: pvrusb2: Prevent a buffer overflow
	powerpc/numa: improve control of topology updates
	powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX
	random: add a spinlock_t to struct batched_entropy
	cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock
	sched/core: Check quota and period overflow at usec to nsec conversion
	sched/rt: Check integer overflow at usec to nsec conversion
	sched/core: Handle overflow in cpu_shares_write_u64
	drm/msm: a5xx: fix possible object reference leak
	USB: core: Don't unbind interfaces following device reset failure
	x86/irq/64: Limit IST stack overflow check to #DB stack
	phy: sun4i-usb: Make sure to disable PHY0 passby for peripheral mode
	i40e: Able to add up to 16 MAC filters on an untrusted VF
	i40e: don't allow changes to HW VLAN stripping on active port VLANs
	arm64: vdso: Fix clock_getres() for CLOCK_REALTIME
	RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
	hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
	hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
	hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
	scsi: libsas: Do discovery on empty PHY to update PHY info
	mmc: core: make pwrseq_emmc (partially) support sleepy GPIO controllers
	mmc_spi: add a status check for spi_sync_locked
	mmc: sdhci-of-esdhc: add erratum eSDHC5 support
	mmc: sdhci-of-esdhc: add erratum A-009204 support
	mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
	drm/amdgpu: fix old fence check in amdgpu_fence_emit
	PM / core: Propagate dev->power.wakeup_path when no callbacks
	clk: rockchip: Fix video codec clocks on rk3288
	extcon: arizona: Disable mic detect if running when driver is removed
	clk: rockchip: Make rkpwm a critical clock on rk3288
	s390: zcrypt: initialize variables before_use
	x86/microcode: Fix the ancient deprecated microcode loading method
	s390: cio: fix cio_irb declaration
	cpufreq: ppc_cbe: fix possible object reference leak
	cpufreq/pasemi: fix possible object reference leak
	cpufreq: pmac32: fix possible object reference leak
	cpufreq: kirkwood: fix possible object reference leak
	block: sed-opal: fix IOC_OPAL_ENABLE_DISABLE_MBR
	x86/build: Keep local relocations with ld.lld
	iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
	iio: hmc5843: fix potential NULL pointer dereferences
	iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
	rtlwifi: fix a potential NULL pointer dereference
	mwifiex: Fix mem leak in mwifiex_tm_cmd
	brcmfmac: fix missing checks for kmemdup
	b43: shut up clang -Wuninitialized variable warning
	brcmfmac: convert dev_init_lock mutex to completion
	brcmfmac: fix WARNING during USB disconnect in case of unempty psq
	brcmfmac: fix race during disconnect when USB completion is in progress
	brcmfmac: fix Oops when bringing up interface during USB disconnect
	rtc: xgene: fix possible race condition
	rtlwifi: fix potential NULL pointer dereference
	scsi: ufs: Fix regulator load and icc-level configuration
	scsi: ufs: Avoid configuring regulator with undefined voltage range
	arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
	x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
	x86/uaccess, signal: Fix AC=1 bloat
	x86/ia32: Fix ia32_restore_sigcontext() AC leak
	chardev: add additional check for minor range overlap
	RDMA/hns: Fix bad endianess of port_pd variable
	HID: core: move Usage Page concatenation to Main item
	ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
	ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
	cxgb3/l2t: Fix undefined behaviour
	HID: logitech-hidpp: change low battery level threshold from 31 to 30 percent
	spi: tegra114: reset controller on probe
	kobject: Don't trigger kobject_uevent(KOBJ_REMOVE) twice.
	media: video-mux: fix null pointer dereferences
	media: wl128x: prevent two potential buffer overflows
	scsi: qedf: Add missing return in qedf_post_io_req() in the fcport offload check
	virtio_console: initialize vtermno value for ports
	tty: ipwireless: fix missing checks for ioremap
	x86/mce: Fix machine_check_poll() tests for error types
	rcutorture: Fix cleanup path for invalid torture_type strings
	rcuperf: Fix cleanup path for invalid perf_type strings
	usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
	scsi: qla4xxx: avoid freeing unallocated dma memory
	batman-adv: allow updating DAT entry timeouts on incoming ARP Replies
	dmaengine: tegra210-adma: use devm_clk_*() helpers
	hwrng: omap - Set default quality
	thunderbolt: Fix to check for kmemdup failure
	media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
	media: vimc: stream: fix thread state before sleep
	media: go7007: avoid clang frame overflow warning with KASAN
	media: vimc: zero the media_device on probe
	scsi: lpfc: Fix FDMI manufacturer attribute value
	scsi: lpfc: Fix fc4type information for FDMI
	media: saa7146: avoid high stack usage with clang
	scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
	spi : spi-topcliff-pch: Fix to handle empty DMA buffers
	spi: rspi: Fix sequencer reset during initialization
	spi: Fix zero length xfer bug
	ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
	drm/drv: Hold ref on parent device during drm_device lifetime
	drm: Wake up next in drm_read() chain if we are forced to putback the event
	vfio-ccw: Prevent quiesce function going into an infinite loop
	NFS: Fix a double unlock from nfs_match,get_client
	Linux 4.14.123

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-31 08:43:41 -07:00
Eric Dumazet
ddfe0bfd06 bpf: devmap: fix use-after-free Read in __dev_map_entry_free
commit 2baae3545327632167c0180e9ca1d467416f1919 upstream.

synchronize_rcu() is fine when the rcu callbacks only need
to free memory (kfree_rcu() or direct kfree() call rcu call backs)

__dev_map_entry_free() is a bit more complex, so we need to make
sure that call queued __dev_map_entry_free() callbacks have completed.

sysbot report:

BUG: KASAN: use-after-free in dev_map_flush_old kernel/bpf/devmap.c:365
[inline]
BUG: KASAN: use-after-free in __dev_map_entry_free+0x2a8/0x300
kernel/bpf/devmap.c:379
Read of size 8 at addr ffff8801b8da38c8 by task ksoftirqd/1/18

CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
  dev_map_flush_old kernel/bpf/devmap.c:365 [inline]
  __dev_map_entry_free+0x2a8/0x300 kernel/bpf/devmap.c:379
  __rcu_reclaim kernel/rcu/rcu.h:178 [inline]
  rcu_do_batch kernel/rcu/tree.c:2558 [inline]
  invoke_rcu_callbacks kernel/rcu/tree.c:2818 [inline]
  __rcu_process_callbacks kernel/rcu/tree.c:2785 [inline]
  rcu_process_callbacks+0xe9d/0x1760 kernel/rcu/tree.c:2802
  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
  run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
  smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
  kthread+0x345/0x410 kernel/kthread.c:240
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Allocated by task 6675:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
  kmalloc include/linux/slab.h:513 [inline]
  kzalloc include/linux/slab.h:706 [inline]
  dev_map_alloc+0x208/0x7f0 kernel/bpf/devmap.c:102
  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
  map_create+0x393/0x1010 kernel/bpf/syscall.c:453
  __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 26:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  dev_map_free+0x4fa/0x670 kernel/bpf/devmap.c:191
  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
  process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
  worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
  kthread+0x345/0x410 kernel/kthread.c:240
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

The buggy address belongs to the object at ffff8801b8da37c0
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
  512-byte region [ffff8801b8da37c0, ffff8801b8da39c0)
The buggy address belongs to the page:
page:ffffea0006e368c0 count:1 mapcount:0 mapping:ffff8801da800940
index:0xffff8801b8da3540
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0007217b88 ffffea0006e30cc8 ffff8801da800940
raw: ffff8801b8da3540 ffff8801b8da3040 0000000100000004 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801b8da3780: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
  ffff8801b8da3800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801b8da3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                               ^
  ffff8801b8da3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801b8da3980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc

Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+457d3e2ffbcf31aee5c0@syzkaller.appspotmail.com
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-31 06:47:13 -07:00
Chenbo Feng
cace572e16 BACKPORT: bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bug: 30950746
Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b

(cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-12-18 21:11:22 +05:30
John Fastabend
8695a53956 bpf: devmap fix arithmetic overflow in bitmap_size calculation
An integer overflow is possible in dev_map_bitmap_size() when
calculating the BITS_TO_LONG logic which becomes, after macro
replacement,

	(((n) + (d) - 1)/ (d))

where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
overflow cast to u64 before arithmetic.

Reported-by: Richard Weinberger <richard@nod.at>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 00:54:09 +01:00
John Fastabend
9ef2a8cd5c bpf: require CAP_NET_ADMIN when using devmap
Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:01:29 +01:00
Daniel Borkmann
82f8dd28bd bpf: fix splat for illegal devmap percpu allocation
It was reported that syzkaller was able to trigger a splat on
devmap percpu allocation due to illegal/unsupported allocation
request size passed to __alloc_percpu():

  [   70.094249] illegal size (32776) or align (8) for percpu allocation
  [   70.094256] ------------[ cut here ]------------
  [   70.094259] WARNING: CPU: 3 PID: 3451 at mm/percpu.c:1365 pcpu_alloc+0x96/0x630
  [...]
  [   70.094325] Call Trace:
  [   70.094328]  __alloc_percpu_gfp+0x12/0x20
  [   70.094330]  dev_map_alloc+0x134/0x1e0
  [   70.094331]  SyS_bpf+0x9bc/0x1610
  [   70.094333]  ? selinux_task_setrlimit+0x5a/0x60
  [   70.094334]  ? security_task_setrlimit+0x43/0x60
  [   70.094336]  entry_SYSCALL_64_fastpath+0x1a/0xa5

This was due to too large max_entries for the map such that we
surpassed the upper limit of PCPU_MIN_UNIT_SIZE. It's fine to
fail naturally here, so switch to __alloc_percpu_gfp() and pass
__GFP_NOWARN instead.

Fixes: 11393cc9b9 ("xdp: Add batching support to redirect map")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Shankara Pailoor <sp3485@columbia.edu>
Reported-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19 13:13:50 +01:00
Tobias Klauser
582db7e0c4 bpf: devmap: pass on return value of bpf_map_precharge_memlock
If bpf_map_precharge_memlock in dev_map_alloc, -ENOMEM is returned
regardless of the actual error produced by bpf_map_precharge_memlock.
Fix it by passing on the error returned by bpf_map_precharge_memlock.

Also return -EINVAL instead of -ENOMEM if the page count overflow check
fails.

This makes dev_map_alloc match the behavior of other bpf maps' alloc
functions wrt. return values.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-18 16:53:30 -07:00
John Fastabend
374fb014fc bpf: devmap, use cond_resched instead of cpu_relax
Be a bit more friendly about waiting for flush bits to complete.
Replace the cpu_relax() with a cond_resched().

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-08 21:11:00 -07:00
Daniel Borkmann
a5e2da6e97 bpf: netdev is never null in __dev_map_flush
No need to test for it in fast-path, every dev in bpf_dtab_netdev
is guaranteed to be non-NULL, otherwise dev_map_update_elem() will
fail in the first place.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23 22:43:40 -07:00
Daniel Borkmann
af4d045cee bpf: minor cleanups for dev_map
Some minor code cleanups, while going over it I also noticed that
we're accounting the bitmap only for one CPU currently, so fix that
up as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-22 21:26:29 -07:00
Daniel Borkmann
274043c6c9 bpf: fix double free from dev_map_notification()
In the current code, dev_map_free() can still race with dev_map_notification().
In dev_map_free(), we remove dtab from the list of dtabs after we purged
all entries from it. However, we don't do xchg() with NULL or the like,
so the entry at that point is still pointing to the device. If a unregister
notification comes in at the same time, we therefore risk a double-free,
since the pointer is still present in the map, and then pushed again to
__dev_map_entry_free().

All this is completely unnecessary. Just remove the dtab from the list
right before the synchronize_rcu(), so all outstanding readers from the
notifier list have finished by then, thus we don't need to deal with this
corner case anymore and also wouldn't need to nullify dev entires. This is
fine because we iterate over the map releasing all entries and therefore
dev references anyway.

Fixes: 4cc7b9544b ("bpf: devmap fix mutex in rcu critical section")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-20 19:45:54 -07:00
Martin KaFai Lau
96eabe7a40 bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference.  The memory usually comes from where the map-creation-process
is running.  The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.

One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps).  Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:

[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]

># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec  #<<<

After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<

This patch adds one field, numa_node, to the bpf_attr.  Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added.  The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.

Numa node selection is not supported for percpu map.

This patch does not change all the kmalloc.  F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
John Fastabend
cf9d014059 bpf: devmap: remove unnecessary value size check
In the devmap alloc map logic we check to ensure that the sizeof the
values are not greater than KMALLOC_MAX_SIZE. But, in the dev map case
we ensure the value size is 4bytes earlier in the function because all
values should be netdev ifindex values.

The second check is harmless but is not needed so remove it.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:35:14 -07:00
John Fastabend
4cc7b9544b bpf: devmap fix mutex in rcu critical section
Originally we used a mutex to protect concurrent devmap update
and delete operations from racing with netdev unregister notifier
callbacks.

The notifier hook is needed because we increment the netdev ref
count when a dev is added to the devmap. This ensures the netdev
reference is valid in the datapath. However, we don't want to block
unregister events, hence the initial mutex and notifier handler.

The concern was in the notifier hook we search the map for dev
entries that hold a refcnt on the net device being torn down. But,
in order to do this we require two steps,

  (i) dereference the netdev:  dev = rcu_dereference(map[i])
 (ii) test ifindex:   dev->ifindex == removing_ifindex

and then finally we can swap in the NULL dev in the map via an
xchg operation,

  xchg(map[i], NULL)

The danger here is a concurrent update could run a different
xchg op concurrently leading us to replace the new dev with a
NULL dev incorrectly.

      CPU 1                        CPU 2

   notifier hook                   bpf devmap update

   dev = rcu_dereference(map[i])
                                   dev = rcu_dereference(map[i])
                                   xchg(map[i]), new_dev);
                                   rcu_call(dev,...)
   xchg(map[i], NULL)

The above flow would create the incorrect state with the dev
reference in the update path being lost. To resolve this the
original code used a mutex around the above block. However,
updates, deletes, and lookups occur inside rcu critical sections
so we can't use a mutex in this context safely.

Fortunately, by writing slightly better code we can avoid the
mutex altogether. If CPU 1 in the above example uses a cmpxchg
and _only_ replaces the dev reference in the map when it is in
fact the expected dev the race is removed completely. The two
cases being illustrated here, first the race condition,

      CPU 1                          CPU 2

   notifier hook                     bpf devmap update

   dev = rcu_dereference(map[i])
                                     dev = rcu_dereference(map[i])
                                     xchg(map[i]), new_dev);
                                     rcu_call(dev,...)
   odev = cmpxchg(map[i], dev, NULL)

Now we can test the cmpxchg return value, detect odev != dev and
abort. Or in the good case,

      CPU 1                          CPU 2

   notifier hook                     bpf devmap update
   dev = rcu_dereference(map[i])
   odev = cmpxchg(map[i], dev, NULL)
                                     [...]

Now 'odev == dev' and we can do proper cleanup.

And viola the original race we tried to solve with a mutex is
corrected and the trace noted by Sasha below is resolved due
to removal of the mutex.

Note: When walking the devmap and removing dev references as needed
we depend on the core to fail any calls to dev_get_by_index() using
the ifindex of the device being removed. This way we do not race with
the user while searching the devmap.

Additionally, the mutex was also protecting list add/del/read on
the list of maps in-use. This patch converts this to an RCU list
and spinlock implementation. This protects the list from concurrent
alloc/free operations. The notifier hook walks this list so it uses
RCU read semantics.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1
1 lock held by syz-executor1/16315:
 #0:  (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] map_delete_elem kernel/bpf/syscall.c:577 [inline]
 #0:  (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] SYSC_bpf kernel/bpf/syscall.c:1427 [inline]
 #0:  (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] SyS_bpf+0x1d32/0x4ba0 kernel/bpf/syscall.c:1388

Fixes: 2ddf71e23c ("net: add notifier hooks for devmap bpf map")
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 14:13:04 -07:00
Dan Carpenter
241a974ba2 bpf: dev_map_alloc() shouldn't return NULL
We forgot to set the error code on two error paths which means that we
return ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is
not expecting that and will have a NULL dereference.

Fixes: 546ac1ffb7 ("bpf: add devmap, a map for storing net device references")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24 16:23:05 -07:00
John Fastabend
2ddf71e23c net: add notifier hooks for devmap bpf map
The BPF map devmap holds a refcnt on the net_device structure when
it is in the map. We need to do this to ensure on driver unload we
don't lose a dev reference.

However, its not very convenient to have to manually unload the map
when destroying a net device so add notifier handlers to do the cleanup
automatically. But this creates a race between update/destroy BPF
syscall and programs and the unregister netdev hook.

Unfortunately, the best I could come up with is either to live with
requiring manual removal of net devices from the map before removing
the net device OR to add a mutex in devmap to ensure the map is not
modified while we are removing a device. The fallout also requires
that BPF programs no longer update/delete the map from the BPF program
side because the mutex may sleep and this can not be done from inside
an rcu critical section.  This is not a real problem though because I
have not come up with any use cases where this is actually useful in
practice. If/when we come up with a compelling user for this we may
need to revisit this.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend
11393cc9b9 xdp: Add batching support to redirect map
For performance reasons we want to avoid updating the tail pointer in
the driver tx ring as much as possible. To accomplish this we add
batching support to the redirect path in XDP.

This adds another ndo op "xdp_flush" that is used to inform the driver
that it should bump the tail pointer on the TX ring.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend
97f91a7cf0 bpf: add bpf_redirect_map helper routine
BPF programs can use the devmap with a bpf_redirect_map() helper
routine to forward packets to netdevice in map.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend
546ac1ffb7 bpf: add devmap, a map for storing net device references
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.

The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.

Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00