Changes in 5.15.59
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Revert "ocfs2: mount shared volume without ha stack"
ntfs: fix use-after-free in ntfs_ucsncmp()
fs: sendfile handles O_NONBLOCK of out_fd
secretmem: fix unhandled fault in truncate
mm: fix page leak with multiple threads mapping the same page
hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte
asm-generic: remove a broken and needless ifdef conditional
s390/archrandom: prevent CPACF trng invocations in interrupt context
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
watch_queue: Fix missing rcu annotation
watch_queue: Fix missing locking in add_watch_to_object()
tcp: Fix data-races around sysctl_tcp_dsack.
tcp: Fix a data-race around sysctl_tcp_app_win.
tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
tcp: Fix a data-race around sysctl_tcp_frto.
tcp: Fix a data-race around sysctl_tcp_nometrics_save.
tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: do not setup vlan for loopback VSI
scsi: ufs: host: Hold reference returned by of_parse_phandle()
Revert "tcp: change pingpong threshold to 3"
octeontx2-pf: Fix UDP/TCP src and dst port tc filters
tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
net: ping6: Fix memleak in ipv6_renew_options().
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net/tls: Remove the context from the list in tls_device_down
igmp: Fix data-races around sysctl_igmp_qrv.
net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmii
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
tcp: Fix a data-race around sysctl_tcp_autocorking.
tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
Documentation: fix sctp_wmem in ip-sysctl.rst
macsec: fix NULL deref in macsec_add_rxsa
macsec: fix error message in macsec_add_rxsa and _txsa
macsec: limit replay window size with XPN
macsec: always read MACSEC_SA_ATTR_PN as a u64
net: macsec: fix potential resource leak in macsec_add_rxsa() and macsec_add_txsa()
net: mld: fix reference count leak in mld_{query | report}_work()
tcp: Fix data-races around sk_pacing_rate.
net: Fix data-races around sysctl_[rw]mem(_offset)?.
tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
tcp: Fix data-races around sysctl_tcp_reflect_tos.
ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
i40e: Fix interface init with MSI interrupts (no MSI-X)
sctp: fix sleep in atomic context bug in timer handlers
octeontx2-pf: cn10k: Fix egress ratelimit configuration
netfilter: nf_queue: do not allow packet truncation below transport header offset
virtio-net: fix the race between refill work and close
perf symbol: Correct address for bss symbols
sfc: disable softirqs for ptp TX
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
ARM: crypto: comment out gcc warning that breaks clang builds
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
EDAC/ghes: Set the DIMM label unconditionally
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Linux 5.15.59
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I00a5e80067e6edfcce68cc0dd54d9bf54c9dbfab
commit 85f0173df35e5462d89947135a6a5599c6c3ef6f upstream.
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263
CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown
So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.
Fixes: dcd1f57295 ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 5.15.25
drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
fs/proc: task_mmu.c: don't read mapcount for migration entry
btrfs: zoned: cache reported zone during mount
scsi: lpfc: Fix mailbox command failure during driver initialization
HID:Add support for UGTABLET WP5540
Revert "svm: Add warning message for AVIC IPI invalid target"
parisc: Show error if wrong 32/64-bit compiler is being used
serial: parisc: GSC: fix build when IOSAPIC is not set
parisc: Drop __init from map_pages declaration
parisc: Fix data TLB miss in sba_unmap_sg
parisc: Fix sglist access in ccio-dma.c
mmc: block: fix read single on recovery logic
mm: don't try to NUMA-migrate COW pages that have other uses
HID: amd_sfh: Add illuminance mask to limit ALS max value
HID: i2c-hid: goodix: Fix a lockdep splat
HID: amd_sfh: Increase sensor command timeout
HID: amd_sfh: Correct the structure field name
PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
btrfs: send: in case of IO error log it
platform/x86: touchscreen_dmi: Add info for the RWC NANOTE P8 AY07J 2-in-1
platform/x86: ISST: Fix possible circular locking dependency detected
kunit: tool: Import missing importlib.abc
selftests: rtc: Increase test timeout so that all tests run
kselftest: signal all child processes
net: ieee802154: at86rf230: Stop leaking skb's
selftests/zram: Skip max_comp_streams interface on newer kernel
selftests/zram01.sh: Fix compression ratio calculation
selftests/zram: Adapt the situation that /dev/zram0 is being used
selftests: openat2: Print also errno in failure messages
selftests: openat2: Add missing dependency in Makefile
selftests: openat2: Skip testcases that fail with EOPNOTSUPP
selftests: skip mincore.check_file_mmap when fs lacks needed support
ax25: improve the incomplete fix to avoid UAF and NPD bugs
pinctrl: bcm63xx: fix unmet dependency on REGMAP for GPIO_REGMAP
vfs: make freeze_super abort when sync_filesystem returns error
quota: make dquot_quota_sync return errors from ->sync_fs
scsi: pm80xx: Fix double completion for SATA devices
kselftest: Fix vdso_test_abi return status
scsi: core: Reallocate device's budget map on queue depth change
scsi: pm8001: Fix use-after-free for aborted TMF sas_task
scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
drm/amd: Warn users about potential s0ix problems
nvme: fix a possible use-after-free in controller reset during load
nvme-tcp: fix possible use-after-free in transport error_recovery work
nvme-rdma: fix possible use-after-free in transport error_recovery work
net: sparx5: do not refer to skb after passing it on
drm/amd: add support to check whether the system is set to s3
drm/amd: Only run s3 or s0ix if system is configured properly
drm/amdgpu: fix logic inversion in check
x86/Xen: streamline (and fix) PV CPU enumeration
Revert "module, async: async_synchronize_full() on module init iff async is used"
gcc-plugins/stackleak: Use noinstr in favor of notrace
random: wake up /dev/random writers after zap
KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
KVM: x86: nSVM: fix potential NULL derefernce on nested migration
KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
iwlwifi: fix use-after-free
drm/radeon: Fix backlight control on iMac 12,1
drm/atomic: Don't pollute crtc_state->mode_blob with error pointers
drm/amd/pm: correct the sequence of sending gpu reset msg
drm/amdgpu: skipping SDMA hw_init and hw_fini for S0ix.
drm/i915/opregion: check port number bounds for SWSCI display power state
drm/i915: Fix dbuf slice config lookup
drm/i915: Fix mbus join config lookup
vsock: remove vsock from connected table when connect is interrupted by a signal
drm/cma-helper: Set VM_DONTEXPAND for mmap
drm/i915/gvt: Make DRM_I915_GVT depend on X86
drm/i915/ttm: tweak priority hint selection
iwlwifi: pcie: fix locking when "HW not ready"
iwlwifi: pcie: gen2: fix locking when "HW not ready"
iwlwifi: mvm: don't send SAR GEO command for 3160 devices
selftests: netfilter: fix exit value for nft_concat_range
netfilter: nft_synproxy: unregister hooks on init error path
selftests: netfilter: disable rp_filter on router
ipv4: fix data races in fib_alias_hw_flags_set
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
ipv6: per-netns exclusive flowlabel checks
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
mac80211: mlme: check for null after calling kmemdup
brcmfmac: firmware: Fix crash in brcm_alt_fw_path
cfg80211: fix race in netlink owner interface destruction
net: dsa: lan9303: fix reset on probe
net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
net: dsa: lantiq_gswip: fix use after free in gswip_remove()
net: dsa: lan9303: handle hwaccel VLAN tags
net: dsa: lan9303: add VLAN IDs to master device
net: ieee802154: ca8210: Fix lifs/sifs periods
ping: fix the dif and sdif check in ping_lookup
bonding: force carrier update when releasing slave
drop_monitor: fix data-race in dropmon_net_event / trace_napi_poll_hit
net_sched: add __rcu annotation to netdev->qdisc
bonding: fix data-races around agg_select_timer
libsubcmd: Fix use-after-free for realloc(..., 0)
net/smc: Avoid overwriting the copies of clcsock callback functions
net: phy: mediatek: remove PHY mode check on MT7531
atl1c: fix tx timeout after link flap on Mikrotik 10/25G NIC
tipc: fix wrong publisher node address in link publications
dpaa2-switch: fix default return of dpaa2_switch_flower_parse_mirror_key
dpaa2-eth: Initialize mutex used in one step timestamping path
net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
perf bpf: Defer freeing string after possible strlen() on it
selftests/exec: Add non-regular to TEST_GEN_PROGS
arm64: Correct wrong label in macro __init_el2_gicv3
ALSA: usb-audio: revert to IMPLICIT_FB_FIXED_DEV for M-Audio FastTrack Ultra
ALSA: hda/realtek: Add quirk for Legion Y9000X 2019
ALSA: hda/realtek: Fix deadlock by COEF mutex
ALSA: hda: Fix regression on forced probe mask option
ALSA: hda: Fix missing codec probe on Shenker Dock 15
ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw()
ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_range()
ASoC: ops: Fix stereo change notifications in snd_soc_put_volsw_sx()
ASoC: ops: Fix stereo change notifications in snd_soc_put_xr_sx()
cifs: fix set of group SID via NTSD xattrs
powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
powerpc/lib/sstep: fix 'ptesync' build error
mtd: rawnand: gpmi: don't leak PM reference in error path
smb3: fix snapshot mount option
tipc: fix wrong notification node addresses
scsi: ufs: Remove dead code
scsi: ufs: Fix a deadlock in the error handler
ASoC: tas2770: Insert post reset delay
ASoC: qcom: Actually clear DMA interrupt register for HDMI
block/wbt: fix negative inflight counter when remove scsi device
NFS: Remove an incorrect revalidation in nfs4_update_changeattr_locked()
NFS: LOOKUP_DIRECTORY is also ok with symlinks
NFS: Do not report writeback errors in nfs_getattr()
tty: n_tty: do not look ahead for EOL character past the end of the buffer
block: fix surprise removal for drivers calling blk_set_queue_dying
mtd: rawnand: qcom: Fix clock sequencing in qcom_nandc_probe()
mtd: parsers: qcom: Fix kernel panic on skipped partition
mtd: parsers: qcom: Fix missing free for pparts in cleanup
mtd: phram: Prevent divide by zero bug in phram_setup()
mtd: rawnand: brcmnand: Fixed incorrect sub-page ECC status
HID: elo: fix memory leak in elo_probe
mtd: rawnand: ingenic: Fix missing put_device in ingenic_ecc_get
Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()
KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
ARM: OMAP2+: hwmod: Add of_node_put() before break
ARM: OMAP2+: adjust the location of put_device() call in omapdss_init_of
phy: usb: Leave some clocks running during suspend
staging: vc04_services: Fix RCU dereference check
phy: phy-mtk-tphy: Fix duplicated argument in phy-mtk-tphy
irqchip/sifive-plic: Add missing thead,c900-plic match string
x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
netfilter: conntrack: don't refresh sctp entries in closed state
ksmbd: fix same UniqueId for dot and dotdot entries
ksmbd: don't align last entry offset in smb2 query directory
arm64: dts: meson-gx: add ATF BL32 reserved-memory region
arm64: dts: meson-g12: add ATF BL32 reserved-memory region
arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
pidfd: fix test failure due to stack overflow on some arches
selftests: fixup build warnings in pidfd / clone3 tests
mm: io_uring: allow oom-killer from io_uring_setup
ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
kconfig: let 'shell' return enough output for deep path names
ata: libata-core: Disable TRIM on M88V29
soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
xprtrdma: fix pointer derefs in error cases of rpcrdma_ep_create
drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
tracing: Fix tp_printk option related with tp_printk_stop_on_boot
display/amd: decrease message verbosity about watermarks table failure
drm/amd/display: Cap pflip irqs per max otg number
drm/amd/display: fix yellow carp wm clamping
net: usb: qmi_wwan: Add support for Dell DW5829e
net: macb: Align the dma and coherent dma masks
kconfig: fix failing to generate auto.conf
scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop
EDAC: Fix calculation of returned address and next offset in edac_align_ptr()
ucounts: Handle wrapping in is_ucounts_overlimit
ucounts: In set_cred_ucounts assume new->ucounts is non-NULL
ucounts: Base set_cred_ucounts changes on the real user
ucounts: Enforce RLIMIT_NPROC not RLIMIT_NPROC+1
lib/iov_iter: initialize "flags" in new pipe_buffer
rlimit: Fix RLIMIT_NPROC enforcement failure caused by capability calls in set_user
ucounts: Move RLIMIT_NPROC handling after set_user
net: sched: limit TC_ACT_REPEAT loops
dmaengine: sh: rcar-dmac: Check for error num after setting mask
dmaengine: stm32-dmamux: Fix PM disable depth imbalance in stm32_dmamux_probe
dmaengine: sh: rcar-dmac: Check for error num after dma_set_max_seg_size
tests: fix idmapped mount_setattr test
i2c: qcom-cci: don't delete an unregistered adapter
i2c: qcom-cci: don't put a device tree node before i2c_add_adapter()
dmaengine: ptdma: Fix the error handling path in pt_core_init()
copy_process(): Move fd_install() out of sighand->siglock critical section
scsi: qedi: Fix ABBA deadlock in qedi_process_tmf_resp() and qedi_process_cmd_cleanup_resp()
ice: enable parsing IPSEC SPI headers for RSS
i2c: brcmstb: fix support for DSL and CM variants
lockdep: Correct lock_classes index mapping
Linux 5.15.25
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib129a0e11f5e82d67563329a5de1b0aef1d87928
Changes in 5.15.19
can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
net: sfp: ignore disabled SFP node
net: stmmac: configure PTP clock source prior to PTP initialization
net: stmmac: skip only stmmac_ptp_register when resume from suspend
ARM: 9179/1: uaccess: avoid alignment faults in copy_[from|to]_kernel_nofault
ARM: 9180/1: Thumb2: align ALT_UP() sections in modules sufficiently
KVM: arm64: Use shadow SPSR_EL1 when injecting exceptions on !VHE
s390/module: fix loading modules with a lot of relocations
s390/hypfs: include z/VM guests with access control group set
s390/nmi: handle guarded storage validity failures for KVM guests
s390/nmi: handle vector validity failures for KVM guests
bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
powerpc32/bpf: Fix codegen for bpf-to-bpf calls
powerpc/bpf: Update ldimm64 instructions during extra pass
ucount: Make get_ucount a safe get_user replacement
scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
efi: runtime: avoid EFIv2 runtime services on Apple x86 machines
PM: wakeup: simplify the output logic of pm_show_wakelocks()
tracing/histogram: Fix a potential memory leak for kstrdup()
tracing: Don't inc err_log entry count if entry allocation fails
ceph: properly put ceph_string reference after async create attempt
ceph: set pool_ns in new inode layout for async creates
fsnotify: fix fsnotify hooks in pseudo filesystems
Revert "KVM: SVM: avoid infinite loop on NPF from bad address"
psi: Fix uaf issue when psi trigger is destroyed while being polled
powerpc/audit: Fix syscall_get_arch()
perf/x86/intel/uncore: Fix CAS_COUNT_WRITE issue for ICX
perf/x86/intel: Add a quirk for the calculation of the number of counters on Alder Lake
drm/etnaviv: relax submit size limits
drm/atomic: Add the crtc to affected crtc only if uapi.enable = true
drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.
KVM: LAPIC: Also cancel preemption timer during SET_LAPIC
KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests
KVM: SVM: Don't intercept #GP for SEV guests
KVM: x86: nSVM: skip eax alignment check for non-SVM instructions
KVM: x86: Forcibly leave nested virt when SMM state is toggled
KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
dm: revert partial fix for redundant bio-based IO accounting
block: add bio_start_io_acct_time() to control start_time
dm: properly fix redundant bio-based IO accounting
serial: pl011: Fix incorrect rs485 RTS polarity on set_mctrl
serial: 8250: of: Fix mapped region size when using reg-offset property
serial: stm32: fix software flow control transfer
tty: n_gsm: fix SW flow control encoding/handling
tty: Partially revert the removal of the Cyclades public API
tty: Add support for Brainboxes UC cards.
kbuild: remove include/linux/cyclades.h from header file check
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
usb: xhci-plat: fix crash when suspend if remote wake enable
usb: common: ulpi: Fix crash in ulpi_match()
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
usb: cdnsp: Fix segmentation fault in cdns_lost_power function
usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
USB: core: Fix hang in usb_kill_urb by adding memory barriers
usb: typec: tcpci: don't touch CC line if it's Vconn source
usb: typec: tcpm: Do not disconnect while receiving VBUS off
usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
ucsi_ccg: Check DEV_INT bit only when starting CCG4
mm, kasan: use compare-exchange operation to set KASAN page tag
jbd2: export jbd2_journal_[grab|put]_journal_head
ocfs2: fix a deadlock when commit trans
sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
PCI/sysfs: Find shadow ROM before static attribute initialization
x86/MCE/AMD: Allow thresholding interface updates after init
x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
powerpc/32s: Allocate one 256k IBAT instead of two consecutives 128k IBATs
powerpc/32s: Fix kasan_init_region() for KASAN
powerpc/32: Fix boot failure with GCC latent entropy plugin
i40e: Increase delay to 1 s after global EMP reset
i40e: Fix issue when maximum queues is exceeded
i40e: Fix queues reservation for XDP
i40e: Fix for failed to init adminq while VF reset
i40e: fix unsigned stat widths
usb: roles: fix include/linux/usb/role.h compile issue
rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
scsi: elx: efct: Don't use GFP_KERNEL under spin lock
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
ipv6_tunnel: Rate limit warning messages
ARM: 9170/1: fix panic when kasan and kprobe are enabled
net: fix information leakage in /proc/net/ptype
hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
hwmon: (lm90) Mark alert as broken for MAX6680
ping: fix the sk_bound_dev_if match in ping_lookup
ipv4: avoid using shared IP generator for connected sockets
hwmon: (lm90) Reduce maximum conversion rate for G781
NFSv4: Handle case where the lookup of a directory fails
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
net-procfs: show net devices bound packet types
drm/msm: Fix wrong size calculation
drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
ipv6: annotate accesses to fn->fn_sernum
NFS: Ensure the server has an up to date ctime before hardlinking
NFS: Ensure the server has an up to date ctime before renaming
KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance
SUNRPC: Use BIT() macro in rpc_show_xprt_state()
SUNRPC: Don't dereference xprt->snd_task if it's a cookie
powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
netfilter: conntrack: don't increment invalid counter on NF_REPEAT
powerpc/64s: Mask SRR0 before checking against the masked NIP
perf: Fix perf_event_read_local() time
sched/pelt: Relax the sync of util_sum with util_avg
net: phy: broadcom: hook up soft_reset for BCM54616S
net: stmmac: dwmac-visconti: Fix bit definitions for ETHER_CLK_SEL
net: stmmac: dwmac-visconti: Fix clock configuration for RMII mode
phylib: fix potential use-after-free
octeontx2-af: Do not fixup all VF action entries
octeontx2-af: Fix LBK backpressure id count
octeontx2-af: Retry until RVU block reset complete
octeontx2-pf: cn10k: Ensure valid pointers are freed to aura
octeontx2-af: verify CQ context updates
octeontx2-af: Increase link credit restore polling timeout
octeontx2-af: cn10k: Do not enable RPM loopback for LPC interfaces
octeontx2-pf: Forward error codes to VF
rxrpc: Adjust retransmission backoff
efi/libstub: arm64: Fix image check alignment at entry
io_uring: fix bug in slow unregistering of nodes
Drivers: hv: balloon: account for vmbus packet header in max_pkt_size
hwmon: (lm90) Re-enable interrupts after alert clears
hwmon: (lm90) Mark alert as broken for MAX6654
hwmon: (lm90) Fix sysfs and udev notifications
hwmon: (adt7470) Prevent divide by zero in adt7470_fan_write()
powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
ipv4: fix ip option filtering for locally generated fragments
ibmvnic: Allow extra failures before disabling
ibmvnic: init ->running_cap_crqs early
ibmvnic: don't spin in tasklet
net/smc: Transitional solution for clcsock race issue
video: hyperv_fb: Fix validation of screen resolution
can: tcan4x5x: regmap: fix max register value
drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
drm/msm/a6xx: Add missing suspend_count increment
yam: fix a memory leak in yam_siocdevprivate()
net: cpsw: Properly initialise struct page_pool_params
net: hns3: handle empty unknown interrupt for VF
sch_htb: Fail on unsupported parameters when offload is requested
Revert "drm/ast: Support 1600x900 with 108MHz PCLK"
KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
ceph: put the requests/sessions when it fails to alloc memory
gve: Fix GFP flags when allocing pages
Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
net: bridge: vlan: fix single net device option dumping
ipv4: raw: lock the socket in raw_bind()
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: remove sparse error in ip_neigh_gw4()
net: bridge: vlan: fix memory leak in __allowed_ingress
Bluetooth: refactor malicious adv data check
irqchip/realtek-rtl: Map control data to virq
irqchip/realtek-rtl: Fix off-by-one in routing
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
perf/core: Fix cgroup event list management
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
usb: dwc3: xilinx: fix uninitialized return value
usr/include/Makefile: add linux/nfc.h to the compile-test coverage
fsnotify: invalidate dcache before IN_DELETE event
block: Fix wrong offset in bio_truncate()
mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
Linux 5.15.19
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I66399d45af362fa8e1672ba38c0d672e21afc716
[ Upstream commit 36268983e90316b37000a005642af42234dabb36 ]
This reverts commit b75326c201.
This commit breaks Linux compatibility with USGv6 tests. The RFC this
commit was based on is actually an expired draft: no published RFC
currently allows the new behaviour it introduced.
Without full IETF endorsement, the flash renumbering scenario this
patch was supposed to enable is never going to work, as other IPv6
equipements on the same LAN will keep the 2 hours limit.
Fixes: b75326c201 ("ipv6: Honor all IPv6 PIO Valid Lifetime values")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The IPv6 Multicast Router Advertisements parsing has the following two
issues:
For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD
messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes,
assuming MLDv2 Reports with at least one multicast address entry).
When ipv6_mc_check_mld_msg() tries to parse an Multicast Router
Advertisement its MLD length check will fail - and it will wrongly
return -EINVAL, even if we have a valid MRD Advertisement. With the
returned -EINVAL the bridge code will assume a broken packet and will
wrongly discard it, potentially leading to multicast packet loss towards
multicast routers.
The second issue is the MRD header parsing in
br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header
immediately after the IPv6 header (IPv6 next header type). However
according to RFC4286, section 2 all MRD messages contain a Router Alert
option (just like MLD). So instead there is an IPv6 Hop-by-Hop option
for the Router Alert between the IPv6 and ICMPv6 header, again leading
to the bridge wrongly discarding Multicast Router Advertisements.
To fix these two issues, introduce a new return value -ENODATA to
ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop
option which is not an MLD but potentially an MRD packet. This also
simplifies further parsing in the bridge code, as ipv6_mc_check_mld()
already fully checks the ICMPv6 header and hop-by-hop option.
These issues were found and fixed with the help of the mrdisc tool
(https://github.com/troglobit/mrdisc).
Fixes: 4b3087c7e3 ("bridge: Snoop Multicast Router Advertisements")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to do 3 things for ipv6_dev_find():
As David A. noticed,
- rt6_lookup() is not really needed. Different from __ip_dev_find(),
ipv6_dev_find() doesn't have a compatibility problem, so remove it.
As Hideaki suggested,
- "valid" (non-tentative) check for the address is also needed.
ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
traverse the address hash list, but it's heavy to be called
inside ipv6_dev_find(). This patch is to reuse the code of
ipv6_chk_addr_and_flags() for ipv6_dev_find().
- dev parameter is passed into ipv6_dev_find(), as link-local
addresses from user space has sin6_scope_id set and the dev
lookup needs it.
Fixes: 81f6cb3122 ("ipv6: add ipv6_dev_find()")
Suggested-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steps on the way to 5.9-rc1
Resolves conflicts in:
drivers/irqchip/qcom-pdc.c
include/linux/device.h
net/xfrm/xfrm_state.c
security/lsm_audit.c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4aeb3d04f4717714a421721eb3ce690c099bb30a
This is to add an ip_dev_find like function for ipv6, used to find
the dev by saddr.
It will be used by TIPC protocol. So also export it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
to IPv4, particularly struct ipv6_ac_socklist. Similar to
struct ipv6_mc_socklist, we should just close it on this path.
This bug can be easily reproduced with the following C program:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
int main()
{
int s, value;
struct sockaddr_in6 addr;
struct ipv6_mreq m6;
s = socket(AF_INET6, SOCK_DGRAM, 0);
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(5000);
inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
connect(s, (struct sockaddr *)&addr, sizeof(addr));
inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
m6.ipv6mr_interface = 5;
setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));
value = AF_INET;
setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));
close(s);
return 0;
}
Reported-by: ch3332xr@gmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.
This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.
Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs. In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.
Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a quest to divide up the 5.7-rc1 merge chunks into reviewable pieces.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I2e5960415348c06e8f10e10cbefb3ee5c3745e73
This patch adds a functionality to addrconf to check on a specific RPL
address configuration. According to RFC 6554:
To detect loops in the SRH, a router MUST determine if the SRH
includes multiple addresses assigned to any interface on that
router. If such addresses appear more than once and are separated by
at least one address not assigned to that router.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Baby steps in the 5.6-rc1 merge cycle to make things easier to review
and debug.
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4c44b3c32065ea0ed8175b31665f2a4195a27300
The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).
This triggered a warning in the 64 bit optimized ipv6_addr_is_*() functions
because sparse doesn't know that the buffer already points to some data in
the correct bitwise integer format. But these were correct and can
therefore be marked with __force to signalize sparse an intended cast to a
specific bitwise type.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and
ipv6_mc_check_mld() calls") replaces direct calls to pskb_may_pull()
in br_ipv6_multicast_mld2_report() with calls to ipv6_mc_may_pull(),
that returns -EINVAL on buffers too short to be valid IPv6 packets,
while maintaining the previous handling of the return code.
This leads to the direct opposite of the intended effect: if the
packet is malformed, -EINVAL evaluates as true, and we'll happily
proceed with the processing.
Return 0 if the packet is too short, in the same way as this was
fixed for IPv4 by commit 083b78a9ed ("ip: fix ip_mc_may_pull()
return value").
I don't have a reproducer for this, unlike the one referred to by
the IPv4 commit, but this is clearly broken.
Fixes: ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get the ingress interface and increment ICMP counters based on that
instead of skb->dev when the the dev is a VRF device.
This is a follow up on the following message:
https://www.spinics.net/lists/netdev/msg560268.html
v2: Avoid changing skb->dev since it has unintended effect for local
delivery (David Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, IPv6 router discovery always puts routes into
RT6_TABLE_MAIN. This causes problems for connection managers
that want to support multiple simultaneous network connections
and want control over which one is used by default (e.g., wifi
and wired).
To work around this connection managers typically take the routes
they prefer and copy them to static routes with low metrics in
the main table. This puts the burden on the connection manager
to watch netlink to see if the routes have changed, delete the
routes when their lifetime expires, etc.
Instead, this patch adds a per-interface sysctl to have the
kernel put autoconf routes into different tables. This allows
each interface to have its own autoconf table, and choosing the
default interface (or using different interfaces at the same
time for different types of traffic) can be done using
appropriate ip rules.
The sysctl behaves as follows:
- = 0: default. Put routes into RT6_TABLE_MAIN as before.
- > 0: manual. Put routes into the specified table.
- < 0: automatic. Add the absolute value of the sysctl to the
device's ifindex, and use that table.
The automatic mode is most useful in conjunction with
net.ipv6.conf.default.accept_ra_rt_table. A connection manager
or distribution could set it to, say, -100 on boot, and
thereafter just use IP rules.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
[AmitP: Refactored original changes to align with
the changes introduced by upstream commits
830218c1ad ("net: ipv6: Fix processing of RAs in presence of VRF"),
8d1c802b28 ("net/ipv6: Flip FIB entries to fib6_info").
Also folded following android-4.9 commit changes into this patch
be65fb01da4d ("ANDROID: net: ipv6: remove unused variable ifindex in")]
Bug: 120445791
Change-Id: I82d16e3737d9cdfa6489e649e247894d0d60cbb1
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
The number of stubs is growing and has nothing to do with addrconf.
Move the definition of the stubs to a separate header file and update
users. In the move, drop the vxlan specific comment before ipv6_stub.
Code move only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib6_ignore_linkdown takes a fib6_info but only looks at the net_device
and its IPv6 config. Change it to take a net_device over a fib6_info as
its input argument.
In addition, move it to a header file to make the check inline and usable
later with IPv4 code without going through the ipv6 stub, and rename to
ip6_ignore_linkdown since it is only checking the setting based on the
ipv6 struct on a device.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap
(see the next patch in the patchset).
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.
To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.
This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.
An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:
1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
beyond the IP packet but still within the skb.
The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.
This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
icmp6_send() function is expensive on systems with a large number of
interfaces. Every time it’s called, it has to verify that the source
address does not correspond to an existing anycast address by looping
through every device and every anycast address on the device. This can
result in significant delays for a CPU when there are a large number of
neighbors and ND timers are frequently timing out and calling
neigh_invalidate().
Add anycast addresses to a global hashtable to allow quick searching for
matching anycast addresses. This is based on inet6_addr_lst in addrconf.c.
Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a more complete fix than d71019b54b ("net: core: Fix build
with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
module is loaded (not just if it's compiled in).
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY. Like other
non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.
BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
to store the bpf context instead of using the skb->cb[48].
At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp). At this
point, it is not always clear where the bpf context can be appended
in the skb->cb[48] to avoid saving-and-restoring cb[]. Even putting
aside the difference between ipv4-vs-ipv6 and udp-vs-tcp. It is not
clear if the lower layer is only ipv4 and ipv6 in the future and
will it not touch the cb[] again before transiting to the upper
layer.
For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
instead of IP[6]CB and it may still modify the cb[] after calling
the udp[46]_lib_lookup_skb(). Because of the above reason, if
sk->cb is used for the bpf ctx, saving-and-restoring is needed
and likely the whole 48 bytes cb[] has to be saved and restored.
Instead of saving, setting and restoring the cb[], this patch opts
to create a new "struct sk_reuseport_kern" and setting the needed
values in there.
The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
will serve all ipv4/ipv6 + udp/tcp combinations. There is no protocol
specific usage at this point and it is also inline with the current
sock_reuseport.c implementation (i.e. no protocol specific requirement).
In "struct sk_reuseport_md", this patch exposes data/data_end/len
with semantic similar to other existing usages. Together
with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
the bpf prog can peek anywhere in the skb. The "bind_inany" tells
the bpf prog that the reuseport group is bind-ed to a local
INANY address which cannot be learned from skb.
The new "bind_inany" is added to "struct sock_reuseport" which will be
used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
to avoid repeating the "bind INANY" test on
"sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run. It can
only be properly initialized when a "sk->sk_reuseport" enabled sk is
adding to a hashtable (i.e. during "reuseport_alloc()" and
"reuseport_add_sock()").
The new "sk_select_reuseport()" is the main helper that the
bpf prog will use to select a SO_REUSEPORT sk. It is the only function
that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY. As mentioned in
the earlier patch, the validity of a selected sk is checked in
run time in "sk_select_reuseport()". Doing the check in
verification time is difficult and inflexible (consider the map-in-map
use case). The runtime check is to compare the selected sk's reuseport_id
with the reuseport_id that we want. This helper will return -EXXX if the
selected sk cannot serve the incoming request (e.g. reuseport_id
not match). The bpf prog can decide if it wants to do SK_DROP as its
discretion.
When the bpf prog returns SK_PASS, the kernel will check if a
valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
If it does , it will use the selected sk. If not, the kernel
will select one from "reuse->socks[]" (as before this patch).
The SK_DROP and SK_PASS handling logic will be in the next patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support for IFA_RT_PRIORITY to ipv6 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.
Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:
addr --> cfg->pfx
peer_addr --> cfg->peer_pfx
pfxlen --> cfg->plen
flags --> cfg->ifa_flags
scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.
Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core
bpf code.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.
The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The statistics such as InHdrErrors should be counted on the ingress
netdev rather than on the dev from the dst, which is the egress.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
== The problem ==
See description of the problem in the initial patch of this patch set.
== The solution ==
The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.
It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.
Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
significantly;
* there is no use-case for port.
As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.
Support is added for IPv4 and IPv6, for TCP and UDP.
IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.
== Implementation notes ==
The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.
bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.
bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ipv6_chk_addr_and_flags determines if an address is a local address and
optionally if it is an address on a specific device. For example, it is
called by ip6_route_info_create to determine if a given gateway address
is a local address. The address check currently does not consider L3
domains and as a result does not allow a route to be added in one VRF
if the nexthop points to an address in a second VRF. e.g.,
$ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
Error: Invalid gateway address.
where 2001:db8:102::23 is an address on an interface in vrf r1.
ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
with a separate argument to not limit the address to the specific device.
The device is used used to determine the L3 domain of interest.
To that end add an argument to skip the device check and update callers
to always pass a device where possible and use the new argument to mean
any address in the domain.
Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
patch handles the change to these callers without adding the domain check.
ip6_validate_gw needs to handle 2 cases - one where the device is given
as part of the nexthop spec and the other where the device is resolved.
There is at least 1 VRF case where deferring the check to only after
the route lookup has resolved the device fails with an unintuitive error
"RTNETLINK answers: No route to host" as opposed to the preferred
"Error: Gateway can not be a local address." The 'no route to host'
error is because of the fallback to a full lookup. The check is done
twice to avoid this error.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.
Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch updates the error messages displayed in kernel log to include
hwaddress of the source machine that caused ipv6 duplicate address
detection failures.
Examples:
a) When we receive a NA packet from another machine advertising our
address:
ICMPv6: NA: 34:ab:cd:56:11:e8 advertised our address 2001:db8:: on eth0!
b) When we detect DAD failure during address assignment to an interface:
IPv6: eth0: IPv6 duplicate address 2001:db8:: used by 34:ab:cd:56:11:e8
detected!
v2:
Changed %pI6 to %pI6c in ndisc_recv_na()
Chaged the v6 address in the commit message to 2001:db8::
Suggested-by: Igor Lubashev <ilubashe@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bring IPv6 in par with IPv4 :
- Use net_hash_mix() to spread addresses a bit more.
- Use 256 slots hash table instead of 16
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>