Commit Graph

244 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
dfd3aa1729 ANDROID: rename struct tcm_sock.cwnd_usage_seq to fix ABI
In commit 49d429760d ("tcp: fix tcp_cwnd_validate() to not forget
is_cwnd_limited"), the field max_packets_seq in struct tcm_sock was
renamed to cwnd_usage_seq, which broke the ABI.  Keep the same logic in
the bugfix, but rename the field back to preserve the ABI.

Bug: 161946584
Fixes: 49d429760d ("tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited")
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9ffb10be63a22c36d5864feea794dde390ef69a1
2022-11-22 11:12:34 +00:00
Greg Kroah-Hartman
b049ff121c Merge 5.15.75 into android13-5.15-lts
Changes in 5.15.75
	Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
	ALSA: oss: Fix potential deadlock at unregistration
	ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
	ALSA: usb-audio: Fix potential memory leaks
	ALSA: usb-audio: Fix NULL dererence at error path
	ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530
	ALSA: hda/realtek: Correct pin configs for ASUS G533Z
	ALSA: hda/realtek: Add quirk for ASUS GV601R laptop
	ALSA: hda/realtek: Add Intel Reference SSID to support headset keys
	mtd: rawnand: atmel: Unmap streaming DMA mappings
	io_uring/net: don't update msg_name if not provided
	hv_netvsc: Fix race between VF offering and VF association message from host
	cifs: destage dirty pages before re-reading them for cache=none
	cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
	iio: dac: ad5593r: Fix i2c read protocol requirements
	iio: ltc2497: Fix reading conversion results
	iio: adc: ad7923: fix channel readings for some variants
	iio: pressure: dps310: Refactor startup procedure
	iio: pressure: dps310: Reset chip after timeout
	xhci: dbc: Fix memory leak in xhci_alloc_dbc()
	usb: add quirks for Lenovo OneLink+ Dock
	can: kvaser_usb: Fix use of uninitialized completion
	can: kvaser_usb_leaf: Fix overread with an invalid command
	can: kvaser_usb_leaf: Fix TX queue out of sync after restart
	can: kvaser_usb_leaf: Fix CAN state after restart
	mmc: sdhci-sprd: Fix minimum clock limit
	i2c: designware: Fix handling of real but unexpected device interrupts
	fs: dlm: fix race between test_bit() and queue_work()
	fs: dlm: handle -EBUSY first in lock arg validation
	HID: multitouch: Add memory barriers
	quota: Check next/prev free block number after reading from quota file
	platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure
	ASoC: wcd9335: fix order of Slimbus unprepare/disable
	ASoC: wcd934x: fix order of Slimbus unprepare/disable
	hwmon: (gsc-hwmon) Call of_node_get() before of_find_xxx API
	net: thunderbolt: Enable DMA paths only after rings are enabled
	regulator: qcom_rpm: Fix circular deferral regression
	arm64: topology: move store_cpu_topology() to shared code
	riscv: topology: fix default topology reporting
	RISC-V: Make port I/O string accessors actually work
	parisc: fbdev/stifb: Align graphics memory size to 4MB
	riscv: Allow PROT_WRITE-only mmap()
	riscv: Make VM_WRITE imply VM_READ
	riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb
	riscv: Pass -mno-relax only on lld < 15.0.0
	UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
	nvmem: core: Fix memleak in nvmem_register()
	nvme-multipath: fix possible hang in live ns resize with ANA access
	nvme-pci: set min_align_mask before calculating max_hw_sectors
	Revert "drm/amdgpu: use dirty framebuffer helper"
	dmaengine: mxs: use platform_driver_register
	drm/virtio: Check whether transferred 2D BO is shmem
	drm/virtio: Unlock reservations on virtio_gpu_object_shmem_init() error
	drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb()
	drm/udl: Restore display mode on resume
	arm64: errata: Add Cortex-A55 to the repeat tlbi list
	mm/damon: validate if the pmd entry is present before accessing
	mm/mmap: undo ->mmap() when arch_validate_flags() fails
	xen/gntdev: Prevent leaking grants
	xen/gntdev: Accommodate VMA splitting
	PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
	serial: 8250: Let drivers request full 16550A feature probing
	serial: 8250: Request full 16550A feature probing for OxSemi PCIe devices
	NFSD: Protect against send buffer overflow in NFSv3 READDIR
	NFSD: Protect against send buffer overflow in NFSv2 READ
	NFSD: Protect against send buffer overflow in NFSv3 READ
	powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domain
	powerpc/boot: Explicitly disable usage of SPE instructions
	slimbus: qcom-ngd: use correct error in message of pdr_add_lookup() failure
	slimbus: qcom-ngd: cleanup in probe error path
	scsi: qedf: Populate sysfs attributes for vport
	gpio: rockchip: request GPIO mux to pinctrl when setting direction
	pinctrl: rockchip: add pinmux_ops.gpio_set_direction callback
	fbdev: smscufx: Fix use-after-free in ufx_ops_open()
	ksmbd: fix endless loop when encryption for response fails
	ksmbd: Fix wrong return value and message length check in smb2_ioctl()
	ksmbd: Fix user namespace mapping
	fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
	btrfs: fix race between quota enable and quota rescan ioctl
	btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
	f2fs: complete checkpoints during remount
	f2fs: flush pending checkpoints when freezing super
	f2fs: increase the limit for reserve_root
	f2fs: fix to do sanity check on destination blkaddr during recovery
	f2fs: fix to do sanity check on summary info
	hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO
	hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
	jbd2: wake up journal waiters in FIFO order, not LIFO
	jbd2: fix potential buffer head reference count leak
	jbd2: fix potential use-after-free in jbd2_fc_wait_bufs
	jbd2: add miss release buffer head in fc_do_one_pass()
	ext4: avoid crash when inline data creation follows DIO write
	ext4: fix null-ptr-deref in ext4_write_info
	ext4: make ext4_lazyinit_thread freezable
	ext4: fix check for block being out of directory size
	ext4: don't increase iversion counter for ea_inodes
	ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate
	ext4: place buffer head allocation before handle start
	ext4: fix dir corruption when ext4_dx_add_entry() fails
	ext4: fix miss release buffer head in ext4_fc_write_inode
	ext4: fix potential memory leak in ext4_fc_record_modified_inode()
	ext4: fix potential memory leak in ext4_fc_record_regions()
	ext4: update 'state->fc_regions_size' after successful memory allocation
	livepatch: fix race between fork and KLP transition
	ftrace: Properly unset FTRACE_HASH_FL_MOD
	ring-buffer: Allow splice to read previous partially read pages
	ring-buffer: Have the shortest_full queue be the shortest not longest
	ring-buffer: Check pending waiters when doing wake ups as well
	ring-buffer: Add ring_buffer_wake_waiters()
	ring-buffer: Fix race between reset page and reading page
	tracing: Disable interrupt or preemption before acquiring arch_spinlock_t
	tracing: Wake up ring buffer waiters on closing of the file
	tracing: Wake up waiters when tracing is disabled
	tracing: Add ioctl() to force ring buffer waiters to wake up
	tracing: Move duplicate code of trace_kprobe/eprobe.c into header
	tracing: Add "(fault)" name injection to kernel probes
	tracing: Fix reading strings from synthetic events
	thunderbolt: Explicitly enable lane adapter hotplug events at startup
	efi: libstub: drop pointless get_memory_map() call
	media: cedrus: Set the platform driver data earlier
	media: cedrus: Fix endless loop in cedrus_h265_skip_bits()
	blk-wbt: call rq_qos_add() after wb_normal is initialized
	KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
	KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
	KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02
	KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS
	staging: greybus: audio_helper: remove unused and wrong debugfs usage
	drm/nouveau/kms/nv140-: Disable interlacing
	drm/nouveau: fix a use-after-free in nouveau_gem_prime_import_sg_table()
	drm/i915: Fix watermark calculations for gen12+ RC CCS modifier
	drm/i915: Fix watermark calculations for gen12+ MC CCS modifier
	drm/i915: Fix watermark calculations for gen12+ CCS+CC modifier
	drm/amd/display: Fix vblank refcount in vrr transition
	smb3: must initialize two ACL struct fields to zero
	selinux: use "grep -E" instead of "egrep"
	ima: fix blocking of security.ima xattrs of unsupported algorithms
	userfaultfd: open userfaultfds with O_RDONLY
	ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
	thermal: cpufreq_cooling: Check the policy first in cpufreq_cooling_register()
	sh: machvec: Use char[] for section boundaries
	MIPS: SGI-IP27: Free some unused memory
	MIPS: SGI-IP27: Fix platform-device leak in bridge_platform_create()
	ARM: 9244/1: dump: Fix wrong pg_level in walk_pmd()
	ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE
	objtool: Preserve special st_shndx indexes in elf_update_symbol
	nfsd: Fix a memory leak in an error handling path
	SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation
	SUNRPC: Fix svcxdr_init_encode's buflen calculation
	NFSD: Protect against send buffer overflow in NFSv2 READDIR
	NFSD: Fix handling of oversized NFSv4 COMPOUND requests
	wifi: rtlwifi: 8192de: correct checking of IQK reload
	wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
	leds: lm3601x: Don't use mutex after it was destroyed
	bpf: Fix reference state management for synchronous callbacks
	wifi: mac80211: allow bw change during channel switch in mesh
	bpftool: Fix a wrong type cast in btf_dumper_int
	spi: mt7621: Fix an error message in mt7621_spi_probe()
	x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register
	xsk: Fix backpressure mechanism on Tx
	bpf: Disable preemption when increasing per-cpu map_locked
	bpf: Propagate error from htab_lock_bucket() to userspace
	bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
	Bluetooth: btusb: mediatek: fix WMT failure during runtime suspend
	wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
	wifi: rtw88: add missing destroy_workqueue() on error path in rtw_core_init()
	selftests/xsk: Avoid use-after-free on ctx
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
	spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
	wifi: rtl8xxxu: Fix skb misuse in TX queue selection
	spi: meson-spicc: do not rely on busy flag in pow2 clk ops
	bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
	wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
	wifi: rtl8xxxu: Remove copy-paste leftover in gen2_update_rate_mask
	wifi: mt76: sdio: fix transmitting packet hangs
	wifi: mt76: mt7615: add mt7615_mutex_acquire/release in mt7615_sta_set_decap_offload
	wifi: mt76: mt7915: do not check state before configuring implicit beamform
	Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release
	net: fs_enet: Fix wrong check in do_pd_setup
	bpf: Ensure correct locking around vulnerable function find_vpid()
	Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure
	netfilter: conntrack: fix the gc rescheduling delay
	netfilter: conntrack: revisit the gc initial rescheduling bias
	wifi: ath11k: fix number of VHT beamformee spatial streams
	x86/microcode/AMD: Track patch allocation size explicitly
	x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
	spi: dw: Fix PM disable depth imbalance in dw_spi_bt1_probe
	spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
	skmsg: Schedule psock work if the cached skb exists on the psock
	i2c: mlxbf: support lock mechanism
	Bluetooth: hci_core: Fix not handling link timeouts propertly
	xfrm: Reinject transport-mode packets through workqueue
	netfilter: nft_fib: Fix for rpath check with VRF devices
	spi: s3c64xx: Fix large transfers with DMA
	wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
	vhost/vsock: Use kvmalloc/kvfree for larger packets.
	eth: alx: take rtnl_lock on resume
	mISDN: fix use-after-free bugs in l1oip timer handlers
	sctp: handle the error returned from sctp_auth_asoc_init_active_key
	tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
	spi: Ensure that sg_table won't be used after being freed
	hwmon: (pmbus/mp2888) Fix sensors readouts for MPS Multi-phase mp2888 controller
	net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
	bnx2x: fix potential memory leak in bnx2x_tpa_stop()
	net: wwan: iosm: Call mutex_init before locking it
	net/ieee802154: reject zero-sized raw_sendmsg()
	once: add DO_ONCE_SLOW() for sleepable contexts
	net: mvpp2: fix mvpp2 debugfs leak
	drm: bridge: adv7511: fix CEC power down control register offset
	drm: bridge: adv7511: unregister cec i2c device after cec adapter
	drm/bridge: Avoid uninitialized variable warning
	drm/mipi-dsi: Detach devices when removing the host
	drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling
	drm/bridge: parade-ps8640: Fix regulator supply order
	drm/dp_mst: fix drm_dp_dpcd_read return value checks
	drm:pl111: Add of_node_put() when breaking out of for_each_available_child_of_node()
	ASoC: mt6359: fix tests for platform_get_irq() failure
	platform/chrome: fix double-free in chromeos_laptop_prepare()
	platform/chrome: fix memory corruption in ioctl
	ASoC: tas2764: Allow mono streams
	ASoC: tas2764: Drop conflicting set_bias_level power setting
	ASoC: tas2764: Fix mute/unmute
	platform/x86: msi-laptop: Fix old-ec check for backlight registering
	platform/x86: msi-laptop: Fix resource cleanup
	platform/chrome: cros_ec_typec: Correct alt mode index
	drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()
	drm/bridge: megachips: Fix a null pointer dereference bug
	ASoC: rsnd: Add check for rsnd_mod_power_on
	ALSA: hda: beep: Simplify keep-power-at-enable behavior
	drm/bochs: fix blanking
	drm/omap: dss: Fix refcount leak bugs
	drm/amdgpu: Fix memory leak in hpd_rx_irq_create_workqueue()
	mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
	ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
	drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx
	drm/msm/dp: correct 1.62G link rate at dp_catalog_ctrl_config_msa()
	drm/vmwgfx: Fix memory leak in vmw_mksstat_add_ioctl()
	ASoC: codecs: tx-macro: fix kcontrol put
	ASoC: da7219: Fix an error handling path in da7219_register_dai_clks()
	ALSA: dmaengine: increment buffer pointer atomically
	mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
	ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
	ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
	ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
	ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe
	ALSA: hda/hdmi: Don't skip notification handling during PM operation
	memory: pl353-smc: Fix refcount leak bug in pl353_smc_probe()
	memory: of: Fix refcount leak bug in of_get_ddr_timings()
	memory: of: Fix refcount leak bug in of_lpddr3_get_ddr_timings()
	locks: fix TOCTOU race when granting write lease
	soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
	soc: qcom: smem_state: Add refcounting for the 'state->of_node'
	ARM: dts: imx6qdl-kontron-samx6i: hook up DDC i2c bus
	ARM: dts: turris-omnia: Fix mpp26 pin name and comment
	ARM: dts: kirkwood: lsxl: fix serial line
	ARM: dts: kirkwood: lsxl: remove first ethernet port
	ia64: export memory_add_physaddr_to_nid to fix cxl build error
	soc/tegra: fuse: Drop Kconfig dependency on TEGRA20_APB_DMA
	arm64: dts: ti: k3-j7200: fix main pinmux range
	ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family
	ARM: Drop CMDLINE_* dependency on ATAGS
	ext4: don't run ext4lazyinit for read-only filesystems
	arm64: ftrace: fix module PLTs with mcount
	ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
	iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
	iio: adc: at91-sama5d2_adc: check return status for pressure and touch
	iio: adc: at91-sama5d2_adc: lock around oversampling and sample freq
	iio: adc: at91-sama5d2_adc: disable/prepare buffer on suspend/resume
	iio: inkern: only release the device node when done with it
	iio: inkern: fix return value in devm_of_iio_channel_get_by_name()
	iio: ABI: Fix wrong format of differential capacitance channel ABI.
	iio: magnetometer: yas530: Change data type of hard_offsets to signed
	RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkey
	usb: common: debug: Check non-standard control requests
	clk: meson: Hold reference returned by of_get_parent()
	clk: oxnas: Hold reference returned by of_get_parent()
	clk: qoriq: Hold reference returned by of_get_parent()
	clk: berlin: Add of_node_put() for of_get_parent()
	clk: sprd: Hold reference returned by of_get_parent()
	clk: tegra: Fix refcount leak in tegra210_clock_init
	clk: tegra: Fix refcount leak in tegra114_clock_init
	clk: tegra20: Fix refcount leak in tegra20_clock_init
	HSI: omap_ssi: Fix refcount leak in ssi_probe
	HSI: omap_ssi_port: Fix dma_map_sg error check
	media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
	tty: xilinx_uartps: Fix the ignore_status
	media: meson: vdec: add missing clk_disable_unprepare on error in vdec_hevc_start()
	media: uvcvideo: Fix memory leak in uvc_gpio_parse
	media: uvcvideo: Use entity get_cur in uvc_ctrl_set
	media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
	RDMA/rxe: Fix "kernel NULL pointer dereference" error
	RDMA/rxe: Fix the error caused by qp->sk
	misc: ocxl: fix possible refcount leak in afu_ioctl()
	fpga: prevent integer overflow in dfl_feature_ioctl_set_irq()
	dmaengine: hisilicon: Disable channels when unregister hisi_dma
	dmaengine: hisilicon: Fix CQ head update
	dmaengine: hisilicon: Add multi-thread support for a DMA channel
	dyndbg: fix static_branch manipulation
	dyndbg: fix module.dyndbg handling
	dyndbg: let query-modname override actual module name
	dyndbg: drop EXPORTed dynamic_debug_exec_queries
	clk: qcom: sm6115: Select QCOM_GDSC
	mtd: devices: docg3: check the return value of devm_ioremap() in the probe
	phy: amlogic: phy-meson-axg-mipi-pcie-analog: Hold reference returned by of_get_parent()
	phy: phy-mtk-tphy: fix the phy type setting issue
	mtd: rawnand: intel: Read the chip-select line from the correct OF node
	mtd: rawnand: intel: Remove undocumented compatible string
	mtd: rawnand: fsl_elbc: Fix none ECC mode
	RDMA/irdma: Align AE id codes to correct flush code and event
	RDMA/srp: Fix srp_abort()
	RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
	RDMA/siw: Fix QP destroy to wait for all references dropped.
	ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
	ata: fix ata_id_has_devslp()
	ata: fix ata_id_has_ncq_autosense()
	ata: fix ata_id_has_dipm()
	mtd: rawnand: meson: fix bit map use in meson_nfc_ecc_correct()
	md: Replace snprintf with scnprintf
	md/raid5: Ensure stripe_fill happens on non-read IO with journal
	md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
	RDMA/cm: Use SLID in the work completion as the DLID in responder side
	IB: Set IOVA/LENGTH on IB_MR in core/uverbs layers
	xhci: Don't show warning for reinit on known broken suspend
	usb: gadget: function: fix dangling pnp_string in f_printer.c
	drivers: serial: jsm: fix some leaks in probe
	serial: 8250: Toggle IER bits on only after irq has been set up
	tty: serial: fsl_lpuart: disable dma rx/tx use flags in lpuart_dma_shutdown
	phy: qualcomm: call clk_disable_unprepare in the error handling
	staging: vt6655: fix some erroneous memory clean-up loops
	slimbus: qcom-ngd-ctrl: allow compile testing without QCOM_RPROC_COMMON
	firmware: google: Test spinlock on panic path to avoid lockups
	serial: 8250: Fix restoring termios speed after suspend
	scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()
	scsi: iscsi: Rename iscsi_conn_queue_work()
	scsi: iscsi: Add recv workqueue helpers
	scsi: iscsi: Run recv path from workqueue
	scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
	clk: qcom: apss-ipq6018: mark apcs_alias0_core_clk as critical
	clk: qcom: gcc-sm6115: Override default Alpha PLL regs
	RDMA/rxe: Fix resize_finish() in rxe_queue.c
	fsi: core: Check error number after calling ida_simple_get
	mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
	mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
	mfd: lp8788: Fix an error handling path in lp8788_probe()
	mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
	mfd: fsl-imx25: Fix check for platform_get_irq() errors
	mfd: sm501: Add check for platform_driver_register()
	clk: mediatek: mt8183: mfgcfg: Propagate rate changes to parent
	dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
	usb: mtu3: fix failed runtime suspend in host only mode
	spmi: pmic-arb: correct duplicate APID to PPID mapping logic
	clk: vc5: Fix 5P49V6901 outputs disabling when enabling FOD
	clk: baikal-t1: Fix invalid xGMAC PTP clock divider
	clk: baikal-t1: Add shared xGMAC ref/ptp clocks internal parent
	clk: baikal-t1: Add SATA internal ref clock buffer
	clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
	clk: imx: scu: fix memleak on platform_device_add() fails
	clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
	clk: ast2600: BCLK comes from EPLL
	mailbox: mpfs: fix handling of the reg property
	mailbox: mpfs: account for mbox offsets while sending
	mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
	powerpc/configs: Properly enable PAPR_SCM in pseries_defconfig
	powerpc/math_emu/efp: Include module.h
	powerpc/sysdev/fsl_msi: Add missing of_node_put()
	powerpc/pci_dn: Add missing of_node_put()
	powerpc/powernv: add missing of_node_put() in opal_export_attrs()
	powerpc: Fix fallocate and fadvise64_64 compat parameter combination
	x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
	powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
	powerpc: Fix SPE Power ISA properties for e500v1 platforms
	powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe()
	powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL
	crypto: sahara - don't sleep when in softirq
	crypto: hisilicon/zip - fix mismatch in get/set sgl_sge_nr
	hwrng: arm-smccc-trng - fix NO_ENTROPY handling
	cgroup: Honor caller's cgroup NS when resolving path
	hwrng: imx-rngc - Moving IRQ handler registering after imx_rngc_irq_mask_clear()
	crypto: qat - fix default value of WDT timer
	crypto: hisilicon/qm - fix missing put dfx access
	cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
	iommu/omap: Fix buffer overflow in debugfs
	crypto: akcipher - default implementation for setting a private key
	crypto: ccp - Release dma channels before dmaengine unrgister
	crypto: inside-secure - Change swab to swab32
	crypto: qat - fix DMA transfer direction
	cifs: return correct error in ->calc_signature()
	iommu/iova: Fix module config properly
	tracing: kprobe: Fix kprobe event gen test module on exit
	tracing: kprobe: Make gen test module work in arm and riscv
	tracing/osnoise: Fix possible recursive locking in stop_per_cpu_kthreads
	kbuild: remove the target in signal traps when interrupted
	kbuild: rpm-pkg: fix breakage when V=1 is used
	crypto: marvell/octeontx - prevent integer overflows
	crypto: cavium - prevent integer overflow loading firmware
	thermal/drivers/qcom/tsens-v0_1: Fix MSM8939 fourth sensor hw_id
	ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
	f2fs: fix race condition on setting FI_NO_EXTENT flag
	f2fs: fix to account FS_CP_DATA_IO correctly
	selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
	fs: dlm: fix race in lowcomms
	rcu: Avoid triggering strict-GP irq-work when RCU is idle
	rcu: Back off upon fill_page_cache_func() allocation failure
	rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
	ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
	ACPI: tables: FPDT: Don't call acpi_os_map_memory() on invalid phys address
	cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode
	MIPS: BCM47XX: Cast memcmp() of function to (void *)
	powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
	thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
	ARM: decompressor: Include .data.rel.ro.local
	ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable
	x86/entry: Work around Clang __bdos() bug
	NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
	NFSD: fix use-after-free on source server when doing inter-server copy
	wifi: brcmfmac: fix invalid address access when enabling SCAN log level
	bpftool: Clear errno after libcap's checks
	ice: set tx_tstamps when creating new Tx rings via ethtool
	net: ethernet: ti: davinci_mdio: Add workaround for errata i2329
	openvswitch: Fix double reporting of drops in dropwatch
	openvswitch: Fix overreporting of drops in dropwatch
	tcp: annotate data-race around tcp_md5sig_pool_populated
	x86/mce: Retrieve poison range from hardware
	wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
	thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround
	xfrm: Update ipcomp_scratches with NULL when freed
	iavf: Fix race between iavf_close and iavf_reset_task
	wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
	Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk
	regulator: core: Prevent integer underflow
	wifi: mt76: mt7921: reset msta->airtime_ac while clearing up hw value
	Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
	Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
	can: bcm: check the result of can_send() in bcm_can_tx()
	wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
	wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
	wifi: rt2x00: set VGC gain for both chains of MT7620
	wifi: rt2x00: set SoC wmac clock register
	wifi: rt2x00: correctly set BBP register 86 for MT7620
	hwmon: (sht4x) do not overflow clamping operation on 32-bit platforms
	net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
	Bluetooth: L2CAP: Fix user-after-free
	r8152: Rate limit overflow messages
	drm/nouveau/nouveau_bo: fix potential memory leak in nouveau_bo_alloc()
	drm: Use size_t type for len variable in drm_copy_field()
	drm: Prevent drm_copy_field() to attempt copying a NULL pointer
	drm/komeda: Fix handling of atomic commits in the atomic_commit_tail hook
	gpu: lontium-lt9611: Fix NULL pointer dereference in lt9611_connector_init()
	drm/amd/display: fix overflow on MIN_I64 definition
	udmabuf: Set ubuf->sg = NULL if the creation of sg table fails
	drm: bridge: dw_hdmi: only trigger hotplug event on link change
	ALSA: usb-audio: Register card at the last interface
	drm/vc4: vec: Fix timings for VEC modes
	drm: panel-orientation-quirks: Add quirk for Anbernic Win600
	platform/chrome: cros_ec: Notify the PM of wake events during resume
	platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
	ASoC: SOF: pci: Change DMI match info to support all Chrome platforms
	drm/amdgpu: fix initial connector audio value
	drm/meson: reorder driver deinit sequence to fix use-after-free bug
	drm/meson: explicitly remove aggregate driver at module unload time
	mmc: sdhci-msm: add compatible string check for sdm670
	drm/dp: Don't rewrite link config when setting phy test pattern
	drm/amd/display: Remove interface for periodic interrupt 1
	ARM: dts: imx7d-sdb: config the max pressure for tsc2046
	ARM: dts: imx6q: add missing properties for sram
	ARM: dts: imx6dl: add missing properties for sram
	ARM: dts: imx6qp: add missing properties for sram
	ARM: dts: imx6sl: add missing properties for sram
	ARM: dts: imx6sll: add missing properties for sram
	ARM: dts: imx6sx: add missing properties for sram
	kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT
	arm64: dts: imx8mq-librem5: Add bq25895 as max17055's power supply
	btrfs: dump extra info if one free space cache has more bitmaps than it should
	btrfs: scrub: try to fix super block errors
	btrfs: don't print information about space cache or tree every remount
	ARM: 9242/1: kasan: Only map modules if CONFIG_KASAN_VMALLOC=n
	clk: zynqmp: Fix stack-out-of-bounds in strncpy`
	media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
	media: platform: fix some double free in meson-ge2d and mtk-jpeg and s5p-mfc
	clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate
	usb: host: xhci-plat: suspend and resume clocks
	usb: host: xhci-plat: suspend/resume clks for brcm
	dmaengine: ti: k3-udma: Reset UDMA_CHAN_RT byte counters to prevent overflow
	scsi: 3w-9xxx: Avoid disabling device if failing to enable it
	nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
	iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity
	power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type()
	staging: vt6655: fix potential memory leak
	blk-throttle: prevent overflow while calculating wait time
	ata: libahci_platform: Sanity check the DT child nodes number
	bcache: fix set_at_max_writeback_rate() for multiple attached devices
	soundwire: cadence: Don't overwrite msg->buf during write commands
	soundwire: intel: fix error handling on dai registration issues
	HID: roccat: Fix use-after-free in roccat_read()
	eventfd: guard wake_up in eventfd fs calls as well
	md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
	usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
	usb: musb: Fix musb_gadget.c rxstate overflow bug
	arm64: dts: imx8mp: Add snps,gfladj-refclk-lpm-sel quirk to USB nodes
	usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug
	Revert "usb: storage: Add quirk for Samsung Fit flash"
	staging: rtl8723bs: fix potential memory leak in rtw_init_drv_sw()
	staging: rtl8723bs: fix a potential memory leak in rtw_init_cmd_priv()
	scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled
	ext2: Use kvmalloc() for group descriptor array
	nvme: copy firmware_rev on each init
	nvmet-tcp: add bounds check on Transfer Tag
	usb: idmouse: fix an uninit-value in idmouse_open
	clk: bcm2835: Make peripheral PLLC critical
	clk: bcm2835: Round UART input clock up
	perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
	io_uring/af_unix: defer registered files gc to io_uring release
	io_uring: correct pinned_vm accounting
	io_uring/rw: fix short rw error handling
	io_uring/rw: fix error'ed retry return values
	io_uring/rw: fix unexpected link breakage
	mm: hugetlb: fix UAF in hugetlb_handle_userfault
	net: ieee802154: return -EINVAL for unknown addr type
	ALSA: usb-audio: Fix last interface check for registration
	blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
	net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses
	Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
	net/ieee802154: don't warn zero-sized raw_sendmsg()
	drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n
	Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
	Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
	lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
	ext4: continue to expand file system when the target size doesn't reach
	thermal: intel_powerclamp: Use first online CPU as control_cpu
	gcov: support GCC 12.1 and newer compilers
	io-wq: Fix memory leak in worker creation
	Linux 5.15.75

Change-Id: I5a3ef9688fb31003940d7e1828f863b9d50f1da9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2022-11-16 16:28:57 +00:00
Neal Cardwell
49d429760d tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
[ Upstream commit f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 ]

This commit fixes a bug in the tracking of max_packets_out and
is_cwnd_limited. This bug can cause the connection to fail to remember
that is_cwnd_limited is true, causing the connection to fail to grow
cwnd when it should, causing throughput to be lower than it should be.

The following event sequence is an example that triggers the bug:

 (a) The connection is cwnd_limited, but packets_out is not at its
     peak due to TSO deferral deciding not to send another skb yet.
     In such cases the connection can advance max_packets_seq and set
     tp->is_cwnd_limited to true and max_packets_out to a small
     number.

(b) Then later in the round trip the connection is pacing-limited (not
     cwnd-limited), and packets_out is larger. In such cases the
     connection would raise max_packets_out to a bigger number but
     (unexpectedly) flip tp->is_cwnd_limited from true to false.

This commit fixes that bug.

One straightforward fix would be to separately track (a) the next
window after max_packets_out reaches a maximum, and (b) the next
window after tp->is_cwnd_limited is set to true. But this would
require consuming an extra u32 sequence number.

Instead, to save space we track only the most important
information. Specifically, we track the strongest available signal of
the degree to which the cwnd is fully utilized:

(1) If the connection is cwnd-limited then we remember that fact for
the current window.

(2) If the connection not cwnd-limited then we track the maximum
number of outstanding packets in the current window.

In particular, note that the new logic cannot trigger the buggy
(a)/(b) sequence above because with the new logic a condition where
tp->packets_out > tp->max_packets_out can only trigger an update of
tp->is_cwnd_limited if tp->is_cwnd_limited is false.

This first showed up in a testing of a BBRv2 dev branch, but this
buggy behavior highlighted a general issue with the
tcp_cwnd_validate() logic that can cause cwnd to fail to increase at
the proper rate for any TCP congestion control, including Reno or
CUBIC.

Fixes: ca8a226343 ("tcp: make cwnd-limited checks measurement-based, and gentler")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26 12:34:48 +02:00
Greg Kroah-Hartman
521f2e62a3 ANDROID: add kabi padding for structures for the android13 release
There are a lot of different structures that need to have a "frozen" abi
for the next 5+ years.  Add padding to a lot of them in order to be able
to handle any future changes that might be needed due to LTS and
security fixes that might come up.

It's a best guess, based on what has happened in the past from the
5.10.0..5.10.110 release (1 1/2 years).  Yes, past changes do not mean
that future changes will also be needed in the same area, but that is a
hint that those areas are both well maintained and looked after, and
there have been previous problems found in them.

Also the list of structures that are being required based on OEM usage
in the android/ symbol lists were consulted as that's a larger list than
what has been changed in the past.

Hopefully we caught everything we need to worry about, only time will
tell...

Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I880bbcda0628a7459988eeb49d18655522697664
2022-05-04 13:39:14 -07:00
Yousuk Seung
e7ed11ee94 tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 18:20:52 -08:00
Wei Wang
e9b12edc13 tcp: record received TOS value in the request socket
A new field is added to the request sock to record the TOS value
received on the listening socket during 3WHS:
When not under syn flood, it is recording the TOS value sent in SYN.
When under syn flood, it is recording the TOS value sent in the ACK.
This is a preparation patch in order to do TOS reflection in the later
commit.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10 13:15:40 -07:00
Martin KaFai Lau
267cf9fa43 tcp: bpf: Optionally store mac header in TCP_SAVE_SYN
This patch is adapted from Eric's patch in an earlier discussion [1].

The TCP_SAVE_SYN currently only stores the network header and
tcp header.  This patch allows it to optionally store
the mac header also if the setsockopt's optval is 2.

It requires one more bit for the "save_syn" bit field in tcp_sock.
This patch achieves this by moving the syn_smc bit next to the is_mptcp.
The syn_smc is currently used with the TCP experimental option.  Since
syn_smc is only used when CONFIG_SMC is enabled, this patch also puts
the "IS_ENABLED(CONFIG_SMC)" around it like the is_mptcp did
with "IS_ENABLED(CONFIG_MPTCP)".

The mac_hdrlen is also stored in the "struct saved_syn"
to allow a quick offset from the bpf prog if it chooses to start
getting from the network header or the tcp header.

[1]: https://lore.kernel.org/netdev/CANn89iLJNWh6bkH7DNhy_kmcAexuUCccqERqe7z2QsvPhGrYPQ@mail.gmail.com/

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200820190123.2886935-1-kafai@fb.com
2020-08-24 14:35:00 -07:00
Martin KaFai Lau
7656d68455 tcp: Add saw_unknown to struct tcp_options_received
In a later patch, the bpf prog only wants to be called to handle
a header option if that particular header option cannot be handled by
the kernel.  This unknown option could be written by the peer's bpf-prog.
It could also be a new standard option that the running kernel does not
support it while a bpf-prog can handle it.

This patch adds a "saw_unknown" bit to "struct tcp_options_received"
and it uses an existing one byte hole to do that.  "saw_unknown" will
be set in tcp_parse_options() if it sees an option that the kernel
cannot handle.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200820190033.2884430-1-kafai@fb.com
2020-08-24 14:35:00 -07:00
Martin KaFai Lau
70a217f197 tcp: Use a struct to represent a saved_syn
The TCP_SAVE_SYN has both the network header and tcp header.
The total length of the saved syn packet is currently stored in
the first 4 bytes (u32) of an array and the actual packet data is
stored after that.

A later patch will add a bpf helper that allows to get the tcp header
alone from the saved syn without the network header.  It will be more
convenient to have a direct offset to a specific header instead of
re-parsing it.  This requires to separately store the network hdrlen.
The total header length (i.e. network + tcp) is still needed for the
current usage in getsockopt.  Although this total length can be obtained
by looking into the tcphdr and then get the (th->doff << 2), this patch
chooses to directly store the tcp hdrlen in the second four bytes of
this newly created "struct saved_syn".  By using a new struct, it can
give a readable name to each individual header length.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200820190014.2883694-1-kafai@fb.com
2020-08-24 14:34:59 -07:00
Yousuk Seung
48040793fa tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-31 17:00:44 -07:00
David S. Miller
a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Yuchung Cheng
76be93fc07 tcp: allow at most one TLP probe per flight
Previously TLP may send multiple probes of new data in one
flight. This happens when the sender is cwnd limited. After the
initial TLP containing new data is sent, the sender receives another
ACK that acks partial inflight.  It may re-arm another TLP timer
to send more, if no further ACK returns before the next TLP timeout
(PTO) expires. The sender may send in theory a large amount of TLP
until send queue is depleted. This only happens if the sender sees
such irregular uncommon ACK pattern. But it is generally undesirable
behavior during congestion especially.

The original TLP design restrict only one TLP probe per inflight as
published in "Reducing Web Latency: the Virtue of Gentle Aggression",
SIGCOMM 2013. This patch changes TLP to send at most one probe
per inflight.

Note that if the sender is app-limited, TLP retransmits old data
and did not have this issue.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23 12:23:32 -07:00
Dmitry Yakunin
aad4a0a951 tcp: Expose tcp_sock_set_keepidle_locked
This is preparation for usage in bpf_setsockopt.

v2:
  - remove redundant EXPORT_SYMBOL (Alexei Starovoitov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200620153052.9439-2-zeil@yandex-team.ru
2020-06-24 11:21:03 -07:00
Christoph Hellwig
480aeb9639 tcp: add tcp_sock_set_keepcnt
Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
d41ecaac90 tcp: add tcp_sock_set_keepintvl
Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
71c48eb81c tcp: add tcp_sock_set_keepidle
Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
c488aeadcb tcp: add tcp_sock_set_user_timeout
Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel
space without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
557eadfcc5 tcp: add tcp_sock_set_syncnt
Add a helper to directly set the TCP_SYNCNT sockopt from kernel space
without going through a fake uaccess.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
ddd061b8da tcp: add tcp_sock_set_quickack
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
12abc5ee78 tcp: add tcp_sock_set_nodelay
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Christoph Hellwig
db10538a4b tcp: add tcp_sock_set_cork
Add a helper to directly set the TCP_CORK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-28 11:11:45 -07:00
Paolo Abeni
90bf45134d mptcp: add new sock flag to deal with join subflows
MP_JOIN subflows must not land into the accept queue.
Currently tcp_check_req() calls an mptcp specific helper
to detect such scenario.

Such helper leverages the subflow context to check for
MP_JOIN subflows. We need to deal also with MP JOIN
failures, even when the subflow context is not available
due allocation failure.

A possible solution would be changing the syn_recv_sock()
signature to allow returning a more descriptive action/
error code and deal with that in tcp_check_req().

Since the above need is MPTCP specific, this patch instead
uses a TCP request socket hole to add a MPTCP specific flag.
Such flag is used by the MPTCP syn_recv_sock() to tell
tcp_check_req() how to deal with the request socket.

This change is a no-op for !MPTCP build, and makes the
MPTCP code simpler. It allows also the next patch to deal
correctly with MP JOIN failure.

v1 -> v2:
 - be more conservative on drop_req initialization (Mat)

RFC -> v1:
 - move the drop_req bit inside tcp_request_sock (Eric)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 12:30:13 -07:00
David S. Miller
3793faad7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 22:10:13 -07:00
Eric Dumazet
2b19585012 tcp: add tp->dup_ack_counter
In commit 86de5921a3 ("tcp: defer SACK compression after DupThresh")
I added a TCP_FASTRETRANS_THRESH bias to tp->compressed_ack in order
to enable sack compression only after 3 dupacks.

Since we plan to relax this rule for flows that involve
stacks not requiring this old rule, this patch adds
a distinct tp->dup_ack_counter.

This means the TCP_FASTRETRANS_THRESH value is now used
in a single location that a future patch can adjust:

	if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) {
		tp->dup_ack_counter++;
		goto send_now;
	}

This patch also introduces tcp_sack_compress_send_ack()
helper to ease following patch comprehension.

This patch refines LINUX_MIB_TCPACKCOMPRESSED to not
count the acks that we had to send if the timer expires
or tcp_sack_compress_send_ack() is sending an ack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 13:24:01 -07:00
Paolo Abeni
cfde141ea3 mptcp: move option parsing into mptcp_incoming_options()
The mptcp_options_received structure carries several per
packet flags (mp_capable, mp_join, etc.). Such fields must
be cleared on each packet, even on dropped ones or packet
not carrying any MPTCP options, but the current mptcp
code clears them only on TCP option reset.

On several races/corner cases we end-up with stray bits in
incoming options, leading to WARN_ON splats. e.g.:

[  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
[  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
[  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
[  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
[  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
[  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
[  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
[  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
[  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
[  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
[  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
[  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
[  171.232586] Call Trace:
[  171.233109]  <IRQ>
[  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
[  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
[  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
[  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
[  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
[  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
[  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
[  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
[  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
[  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
[  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
[  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
[  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
[  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
[  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
[  171.282358]  </IRQ>

We could address the issue clearing explicitly the relevant fields
in several places - tcp_parse_option, tcp_fast_parse_options,
possibly others.

Instead we move the MPTCP option parsing into the already existing
mptcp ingress hook, so that we need to clear the fields in a single
place.

This allows us dropping an MPTCP hook from the TCP code and
removing the quite large mptcp_options_received from the tcp_sock
struct. On the flip side, the MPTCP sockets will traverse the
option space twice (in tcp_parse_option() and in
mptcp_incoming_options(). That looks acceptable: we already
do that for syn and 3rd ack packets, plain TCP socket will
benefit from it, and even MPTCP sockets will experience better
code locality, reducing the jumps between TCP and MPTCP code.

v1 -> v2:
 - rebased on current '-net' tree

Fixes: 648ef4b886 ("mptcp: Implement MPTCP receive path")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-30 12:23:22 -07:00
Peter Krystad
f296234c98 mptcp: Add handling of incoming MP_JOIN requests
Process the MP_JOIN option in a SYN packet with the same flow
as MP_CAPABLE but when the third ACK is received add the
subflow to the MPTCP socket subflow list instead of adding it to
the TCP socket accept queue.

The subflow is added at the end of the subflow list so it will not
interfere with the existing subflows operation and no data is
expected to be transmitted on it.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Peter Krystad
3df523ab58 mptcp: Add ADD_ADDR handling
Add handling for sending and receiving the ADD_ADDR, ADD_ADDR6,
and RM_ADDR suboptions.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:14:48 -07:00
Florian Westphal
ae2dd71649 mptcp: handle tcp fallback when using syn cookies
We can't deal with syncookie mode yet, the syncookie rx path will create
tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one:

TCP: SYN flooding on port 20002. Sending cookies.
BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120
 subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
 tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209
 cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252
 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline]
 [..]

Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2".

Note that MPTCP should work with syncookies (4th ack would carry needed
state), but it appears better to sort that out in -next so do tcp
fallback for now.

I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because
if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair.

Cc: Eric Dumazet <edumazet@google.com>
Fixes: cec37a6e41 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29 17:45:20 +01:00
Abdul Kabbani
32efcc06d2 tcp: export count for rehash attempts
Using IPv6 flow-label to swiftly route around avoid congested or
disconnected network path can greatly improve TCP reliability.

This patch adds SNMP counters and a OPT_STATS counter to track both
host-level and connection-level statistics. Network administrators
can use these counters to evaluate the impact of this new ability better.

Export count for rehash attempts to
1) two SNMP counters: TcpTimeoutRehash (rehash due to timeouts),
   and TcpDuplicateDataRehash (rehash due to receiving duplicate
   packets)
2) Timestamping API SOF_TIMESTAMPING_OPT_STATS.

Signed-off-by: Abdul Kabbani <akabbani@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-26 15:28:47 +01:00
Christoph Paasch
cc7972ea19 mptcp: parse and emit MP_CAPABLE option according to v1 spec
This implements MP_CAPABLE options parsing and writing according
to RFC 6824 bis / RFC 8684: MPTCP v1.

Local key is sent on syn/ack, and both keys are sent on 3rd ack.
MP_CAPABLE messages len are updated accordingly. We need the skbuff to
correctly emit the above, so we push the skbuff struct as an argument
all the way from tcp code to the relevant mptcp callbacks.

When processing incoming MP_CAPABLE + data, build a full blown DSS-like
map info, to simplify later processing.  On child socket creation, we
need to record the remote key, if available.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 13:44:08 +01:00
Mat Martineau
648ef4b886 mptcp: Implement MPTCP receive path
Parses incoming DSS options and populates outgoing MPTCP ACK
fields. MPTCP fields are parsed from the TCP option header and placed in
an skb extension, allowing the upper MPTCP layer to access MPTCP
options after the skb has gone through the TCP stack.

The subflow implements its own data_ready() ops, which ensures that the
pending data is in sequence - according to MPTCP seq number - dropping
out-of-seq skbs. The DATA_READY bit flag is set if this is the case.
This allows the MPTCP socket layer to determine if more data is
available without having to consult the individual subflows.

It additionally validates the current mapping and propagates EoF events
to the connection socket.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 13:44:07 +01:00
Peter Krystad
cec37a6e41 mptcp: Handle MP_CAPABLE options for outgoing connections
Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request,
to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to
the final ACK of the three-way handshake.

Use the .sk_rx_dst_set() handler in the subflow proto to capture when the
responding SYN-ACK is received and notify the MPTCP connection layer.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 13:44:07 +01:00
Peter Krystad
2303f994b3 mptcp: Associate MPTCP context with TCP socket
Use ULP to associate a subflow_context structure with each TCP subflow
socket. Creating these sockets requires new bind and connect functions
to make sure ULP is set up immediately when the subflow sockets are
created.

Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 13:44:07 +01:00
Peter Krystad
eda7acddf8 mptcp: Handle MPTCP TCP options
Add hooks to parse and format the MP_CAPABLE option.

This option is handled according to MPTCP version 0 (RFC6824).
MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in
coordination with related code changes.

Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Co-developed-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Peter Krystad <peter.krystad@linux.intel.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24 13:44:07 +01:00
Jason Baron
480274787d tcp: add TCP_INFO status for failed client TFO
The TCPI_OPT_SYN_DATA bit as part of tcpi_options currently reports whether
or not data-in-SYN was ack'd on both the client and server side. We'd like
to gather more information on the client-side in the failure case in order
to indicate the reason for the failure. This can be useful for not only
debugging TFO, but also for creating TFO socket policies. For example, if
a middle box removes the TFO option or drops a data-in-SYN, we can
can detect this case, and turn off TFO for these connections saving the
extra retransmits.

The newly added tcpi_fastopen_client_fail status is 2 bits and has the
following 4 states:

1) TFO_STATUS_UNSPEC

Catch-all state which includes when TFO is disabled via black hole
detection, which is indicated via LINUX_MIB_TCPFASTOPENBLACKHOLE.

2) TFO_COOKIE_UNAVAILABLE

If TFO_CLIENT_NO_COOKIE mode is off, this state indicates that no cookie
is available in the cache.

3) TFO_DATA_NOT_ACKED

Data was sent with SYN, we received a SYN/ACK but it did not cover the data
portion. Cookie is not accepted by server because the cookie may be invalid
or the server may be overloaded.

4) TFO_SYN_RETRANSMITTED

Data was sent with SYN, we received a SYN/ACK which did not cover the data
after at least 1 additional SYN was sent (without data). It may be the case
that a middle-box is dropping data-in-SYN packets. Thus, it would be more
efficient to not use TFO on this connection to avoid extra retransmits
during connection establishment.

These new fields do not cover all the cases where TFO may fail, but other
failures, such as SYN/ACK + data being dropped, will result in the
connection not becoming established. And a connection blackhole after
session establishment shows up as a stalled connection.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Christoph Paasch <cpaasch@apple.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-25 19:25:37 -07:00
Eric Dumazet
d983ea6f16 tcp: add rcu protection around tp->fastopen_rsk
Both tcp_v4_err() and tcp_v6_err() do the following operations
while they do not own the socket lock :

	fastopen = tp->fastopen_rsk;
 	snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una;

The problem is that without appropriate barrier, the compiler
might reload tp->fastopen_rsk and trigger a NULL deref.

request sockets are protected by RCU, we can simply add
the missing annotations and barriers to solve the issue.

Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13 10:13:08 -07:00
Thomas Higdon
f9af2dbbfe tcp: Add TCP_INFO counter for packets received out-of-order
For receive-heavy cases on the server-side, we want to track the
connection quality for individual client IPs. This counter, similar to
the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
tracks out-of-order packet reception. By providing this counter in
TCP_INFO, it will allow understanding to what degree receive-heavy
sockets are experiencing out-of-order delivery and packet drops
indicating congestion.

Please note that this is similar to the counter in NetBSD TCP_INFO, and
has the same name.

Also note that we avoid increasing the size of the tcp_sock struct by
taking advantage of a hole.

Signed-off-by: Thomas Higdon <tph@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16 16:26:11 +02:00
Ard Biesheuvel
438ac88009 net: fastopen: robustness and endianness fixes for SipHash
Some changes to the TCP fastopen code to make it more robust
against future changes in the choice of key/cookie size, etc.

- Instead of keeping the SipHash key in an untyped u8[] buffer
  and casting it to the right type upon use, use the correct
  type directly. This ensures that the key will appear at the
  correct alignment if we ever change the way these data
  structures are allocated. (Currently, they are only allocated
  via kmalloc so they always appear at the correct alignment)

- Use DIV_ROUND_UP when sizing the u64[] array to hold the
  cookie, so it is always of sufficient size, even if
  TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.

- Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
  function, which is no longer used.

- Add endian swabbing when setting the keys and calculating the hash,
  to ensure that cookie values are the same for a given key and
  source/destination address pair regardless of the endianness of
  the server.

Note that none of these are functional changes wrt the current
state of the code, with the exception of the swabbing, which only
affects big endian systems.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22 16:30:37 -07:00
David S. Miller
13091aa305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Honestly all the conflicts were simple overlapping changes,
nothing really interesting to report.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 20:20:36 -07:00
Ard Biesheuvel
c681edae33 net: ipv4: move tcp_fastopen server side code to SipHash library
Using a bare block cipher in non-crypto code is almost always a bad idea,
not only for security reasons (and we've seen some examples of this in
the kernel in the past), but also for performance reasons.

In the TCP fastopen case, we call into the bare AES block cipher one or
two times (depending on whether the connection is IPv4 or IPv6). On most
systems, this results in a call chain such as

  crypto_cipher_encrypt_one(ctx, dst, src)
    crypto_cipher_crt(tfm)->cit_encrypt_one(crypto_cipher_tfm(tfm), ...);
      aesni_encrypt
        kernel_fpu_begin();
        aesni_enc(ctx, dst, src); // asm routine
        kernel_fpu_end();

It is highly unlikely that the use of special AES instructions has a
benefit in this case, especially since we are doing the above twice
for IPv6 connections, instead of using a transform which can process
the entire input in one go.

We could switch to the cbcmac(aes) shash, which would at least get
rid of the duplicated overhead in *some* cases (i.e., today, only
arm64 has an accelerated implementation of cbcmac(aes), while x86 will
end up using the generic cbcmac template wrapping the AES-NI cipher,
which basically ends up doing exactly the above). However, in the given
context, it makes more sense to use a light-weight MAC algorithm that
is more suitable for the purpose at hand, such as SipHash.

Since the output size of SipHash already matches our chosen value for
TCP_FASTOPEN_COOKIE_SIZE, and given that it accepts arbitrary input
sizes, this greatly simplifies the code as well.

NOTE: Server farms backing a single server IP for load balancing purposes
      and sharing a single fastopen key will be adversely affected by
      this change unless all systems in the pool receive their kernel
      upgrades at the same time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-17 13:56:26 -07:00
Eric Dumazet
3b4929f65b tcp: limit payload size of sacked skbs
Jonathan Looney reported that TCP can trigger the following crash
in tcp_shifted_skb() :

	BUG_ON(tcp_skb_pcount(skb) < pcount);

This can happen if the remote peer has advertized the smallest
MSS that linux TCP accepts : 48

An skb can hold 17 fragments, and each fragment can hold 32KB
on x86, or 64KB on PowerPC.

This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
can overflow.

Note that tcp_sendmsg() builds skbs with less than 64KB
of payload, so this problem needs SACK to be enabled.
SACK blocks allow TCP to coalesce multiple skbs in the retransmit
queue, thus filling the 17 fragments to maximal capacity.

CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs

Fixes: 832d11c5cd ("tcp: Try to restore large SKBs while SACK processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Looney <jtl@netflix.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Bruce Curtis <brucec@netflix.com>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-15 18:47:31 -07:00
Eric Dumazet
a842fe1425 tcp: add optional per socket transmit delay
Adding delays to TCP flows is crucial for studying behavior
of TCP stacks, including congestion control modules.

Linux offers netem module, but it has unpractical constraints :
- Need root access to change qdisc
- Hard to setup on egress if combined with non trivial qdisc like FQ
- Single delay for all flows.

EDT (Earliest Departure Time) adoption in TCP stack allows us
to enable a per socket delay at a very small cost.

Networking tools can now establish thousands of flows, each of them
with a different delay, simulating real world conditions.

This requires FQ packet scheduler or a EDT-enabled NIC.

This patchs adds TCP_TX_DELAY socket option, to set a delay in
usec units.

  unsigned int tx_delay = 10000; /* 10 msec */

  setsockopt(fd, SOL_TCP, TCP_TX_DELAY, &tx_delay, sizeof(tx_delay));

Note that FQ packet scheduler limits might need some tweaking :

man tc-fq

PARAMETERS
   limit
       Hard  limit  on  the  real  queue  size. When this limit is
       reached, new packets are dropped. If the value is  lowered,
       packets  are  dropped so that the new limit is met. Default
       is 10000 packets.

   flow_limit
       Hard limit on the maximum  number  of  packets  queued  per
       flow.  Default value is 100.

Use of TCP_TX_DELAY option will increase number of skbs in FQ qdisc,
so packets would be dropped if any of the previous limit is hit.

Use of a jump label makes this support runtime-free, for hosts
never using the option.

Also note that TSQ (TCP Small Queues) limits are slightly changed
with this patch : we need to account that skbs artificially delayed
wont stop us providind more skbs to feed the pipe (netem uses
skb_orphan_partial() for this purpose, but FQ can not use this trick)

Because of that, using big delays might very well trigger
old bugs in TSO auto defer logic and/or sndbuf limited detection.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 13:05:43 -07:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Eric Dumazet
86de5921a3 tcp: defer SACK compression after DupThresh
Jean-Louis reported a TCP regression and bisected to recent SACK
compression.

After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).

Sender waits a full RTO timer before recovering losses.

While RFC 6675 says in section 5, "Algorithm Details",

   (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
       indicating at least three segments have arrived above the current
       cumulative acknowledgment point, which is taken to indicate loss
       -- go to step (4).
...
   (4) Invoke fast retransmit and enter loss recovery as follows:

there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.

While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.

This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.

Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.

Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.

Fixes: 5d9f4262b7 ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 15:49:52 -08:00
Eric Dumazet
5f6188a800 tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh
In EDT design, I made the mistake of using tcp_wstamp_ns
to store the last tcp_clock_ns() sample and to store the
pacing virtual timer.

This causes major regressions at high speed flows.

Introduce tcp_clock_cache to store last tcp_clock_ns().
This is needed because some arches have slow high-resolution
kernel time service.

tcp_wstamp_ns is only updated when a packet is sent.

Note that we can remove tcp_mstamp in the future since
tcp_mstamp is essentially tcp_clock_cache/1000, so the
apparent socket size increase is temporary.

Fixes: 9799ccb0e9 ("tcp: add tcp_wstamp_ns socket field")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:41 -07:00
Eric Dumazet
9799ccb0e9 tcp: add tcp_wstamp_ns socket field
TCP will soon provide earliest departure time on TX skbs.
It needs to track this in a new variable.

tcp_mstamp_refresh() needs to update this variable, and
became too big to stay an inline.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-21 19:37:59 -07:00
Wei Wang
7ec65372ca tcp: add stat of data packet reordering events
Introduce a new TCP stats to record the number of reordering events seen
and expose it in both tcp_info (TCP_INFO) and opt_stats
(SOF_TIMESTAMPING_OPT_STATS).
Application can use this stats to track the frequency of the reordering
events in addition to the existing reordering stats which tracks the
magnitude of the latest reordering event.

Note: this new stats tracks reordering events triggered by ACKs, which
could often be fewer than the actual number of packets being delivered
out-of-order.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:56:10 -07:00
Wei Wang
7e10b6554f tcp: add dsack blocks received stats
Introduce a new TCP stat to record the number of DSACK blocks received
(RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:56:10 -07:00
Wei Wang
fb31c9b9f6 tcp: add data bytes retransmitted stats
Introduce a new TCP stat to record the number of bytes retransmitted
(RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:56:10 -07:00
Wei Wang
ba113c3aa7 tcp: add data bytes sent stats
Introduce a new TCP stat to record the number of bytes sent
(RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info
(TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01 09:56:10 -07:00