94de3b405c8dee0ffc8de5c06b32fbf00fc4e8f9
331 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7622c50ba6 |
Merge 5.15.91 into android13-5.15-lts
Changes in 5.15.91 memory: tegra: Remove clients SID override programming memory: atmel-sdramc: Fix missing clk_disable_unprepare in atmel_ramc_probe() memory: mvebu-devbus: Fix missing clk_disable_unprepare in mvebu_devbus_probe() dmaengine: ti: k3-udma: Do conditional decrement of UDMA_CHAN_RT_PEER_BCNT_REG arm64: dts: imx8mp-phycore-som: Remove invalid PMIC property ARM: dts: imx6ul-pico-dwarf: Use 'clock-frequency' ARM: dts: imx7d-pico: Use 'clock-frequency' ARM: dts: imx6qdl-gw560x: Remove incorrect 'uart-has-rtscts' arm64: dts: imx8mm-beacon: Fix ecspi2 pinmux ARM: imx: add missing of_node_put() HID: intel_ish-hid: Add check for ishtp_dma_tx_map arm64: dts: imx8mm-venice-gw7901: fix USB2 controller OC polarity soc: imx8m: Fix incorrect check for of_clk_get_by_name() reset: uniphier-glue: Use reset_control_bulk API reset: uniphier-glue: Fix possible null-ptr-deref EDAC/highbank: Fix memory leak in highbank_mc_probe() firmware: arm_scmi: Harden shared memory access in fetch_response firmware: arm_scmi: Harden shared memory access in fetch_notification tomoyo: fix broken dependency on *.conf.default RDMA/core: Fix ib block iterator counter overflow IB/hfi1: Reject a zero-length user expected buffer IB/hfi1: Reserve user expected TIDs IB/hfi1: Fix expected receive setup error exit issues IB/hfi1: Immediately remove invalid memory from hardware IB/hfi1: Remove user expected buffer invalidate race affs: initialize fsdata in affs_truncate() PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe() arm64: dts: qcom: msm8992: Don't use sfpb mutex arm64: dts: qcom: msm8992-libra: Add CPU regulators arm64: dts: qcom: msm8992-libra: Fix the memory map phy: ti: fix Kconfig warning and operator precedence NFSD: fix use-after-free in nfsd4_ssc_setup_dul() ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60 amd-xgbe: TX Flow Ctrl Registers are h/w ver dependent amd-xgbe: Delay AN timeout during KR training bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation phy: rockchip-inno-usb2: Fix missing clk_disable_unprepare() in rockchip_usb2phy_power_on() net: nfc: Fix use-after-free in local_cleanup() net: wan: Add checks for NULL for utdm in undo_uhdlc_init and unmap_si_regs net: enetc: avoid deadlock in enetc_tx_onestep_tstamp() sch_htb: Avoid grafting on htb_destroy_class_offload when destroying htb gpio: use raw spinlock for gpio chip shadowed data gpio: mxc: Protect GPIO irqchip RMW with bgpio spinlock gpio: mxc: Always set GPIOs used as interrupt source to INPUT mode wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid pinctrl/rockchip: Use temporary variable for struct device pinctrl/rockchip: add error handling for pull/drive register getters pinctrl: rockchip: fix reading pull type on rk3568 net: stmmac: Fix queue statistics reading net/sched: sch_taprio: fix possible use-after-free l2tp: Serialize access to sk_user_data with sk_callback_lock l2tp: Don't sleep and disable BH under writer-side sk_callback_lock l2tp: convert l2tp_tunnel_list to idr l2tp: close all race conditions in l2tp_tunnel_register() octeontx2-pf: Avoid use of GFP_KERNEL in atomic context net: usb: sr9700: Handle negative len net: mdio: validate parameter addr in mdiobus_get_phy() HID: check empty report_list in hid_validate_values() HID: check empty report_list in bigben_probe() net: stmmac: fix invalid call to mdiobus_get_phy() pinctrl: rockchip: fix mux route data for rk3568 HID: revert CHERRY_MOUSE_000C quirk usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait usb: gadget: f_fs: Ensure ep0req is dequeued before free_request Bluetooth: Fix possible deadlock in rfcomm_sk_state_change net: ipa: disable ipa interrupt during suspend net/mlx5: E-switch, Fix setting of reserved fields on MODIFY_SCHEDULING_ELEMENT net: mlx5: eliminate anonymous module_init & module_exit drm/panfrost: fix GENERIC_ATOMIC64 dependency dmaengine: Fix double increment of client_count in dma_chan_get() net: macb: fix PTP TX timestamp failure due to packet padding virtio-net: correctly enable callback during start_xmit l2tp: prevent lockdep issue in l2tp_tunnel_register() HID: betop: check shape of output reports cifs: fix potential deadlock in cache_refresh_path() dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node() phy: phy-can-transceiver: Skip warning if no "max-bitrate" drm/amd/display: fix issues with driver unload nvme-pci: fix timeout request state check tcp: avoid the lookup process failing to get sk in ehash table octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt ptdma: pt_core_execute_cmd() should use spinlock device property: fix of node refcount leak in fwnode_graph_get_next_endpoint() w1: fix deadloop in __w1_remove_master_device() w1: fix WARNING after calling w1_process() driver core: Fix test_async_probe_init saves device in wrong array selftests/net: toeplitz: fix race on tpacket_v3 block close net: dsa: microchip: ksz9477: port map correction in ALU table entry register thermal/core: Remove duplicate information when an error occurs thermal/core: Rename 'trips' to 'num_trips' thermal: Validate new state in cur_state_store() thermal/core: fix error code in __thermal_cooling_device_register() thermal: core: call put_device() only after device_register() fails net: stmmac: enable all safety features by default tcp: fix rate_app_limited to default to 1 scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist kcsan: test: don't put the expect array on the stack cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist ASoC: fsl_micfil: Correct the number of steps on SX controls net: usb: cdc_ether: add support for Thales Cinterion PLS62-W modem drm: Add orientation quirk for Lenovo ideapad D330-10IGL s390/debug: add _ASM_S390_ prefix to header guard s390: expicitly align _edata and _end symbols on page boundary perf/x86/msr: Add Emerald Rapids perf/x86/intel/uncore: Add Emerald Rapids cpufreq: armada-37xx: stop using 0 as NULL pointer ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets spi: spidev: remove debug messages that access spidev->spi without locking KVM: s390: interrupt: use READ_ONCE() before cmpxchg() scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id r8152: add vendor/device ID pair for Microsoft Devkit platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK lockref: stop doing cpu_relax in the cmpxchg loop firmware: coreboot: Check size of table entry and use flex-array drm/i915: Allow switching away via vga-switcheroo if uninitialized Revert "selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID" drm/i915: Remove unused variable x86: ACPI: cstate: Optimize C3 entry on AMD CPUs fs: reiserfs: remove useless new_opts in reiserfs_remount sysctl: add a new register_sysctl_init() interface kernel/panic: move panic sysctls to its own file panic: unset panic_on_warn inside panic() ubsan: no need to unset panic_on_warn in ubsan_epilogue() kasan: no need to unset panic_on_warn in end_report() exit: Add and use make_task_dead. objtool: Add a missing comma to avoid string concatenation hexagon: Fix function name in die() h8300: Fix build errors from do_exit() to make_task_dead() transition csky: Fix function name in csky_alignment() and die() ia64: make IA64_MCA_RECOVERY bool instead of tristate panic: Separate sysctl logic from CONFIG_SMP exit: Put an upper limit on how often we can oops exit: Expose "oops_count" to sysfs exit: Allow oops_limit to be disabled panic: Consolidate open-coded panic_on_warn checks panic: Introduce warn_limit panic: Expose "warn_count" to sysfs docs: Fix path paste-o for /sys/kernel/warn_count exit: Use READ_ONCE() for all oops/warn limit reads Bluetooth: hci_sync: cancel cmd_timer if hci_open failed drm/amdgpu: complete gfxoff allow signal during suspend without delay scsi: hpsa: Fix allocation size for scsi_host_alloc() KVM: SVM: fix tsc scaling cache logic module: Don't wait for GOING modules tracing: Make sure trace_printk() can output as soon as it can be used trace_events_hist: add check for return value of 'create_hist_field' ftrace/scripts: Update the instructions for ftrace-bisect.sh cifs: Fix oops due to uncleared server->smbd_conn in reconnect i2c: mv64xxx: Remove shutdown method from driver i2c: mv64xxx: Add atomic_xfer method to driver ksmbd: add smbd max io size parameter ksmbd: add max connections parameter ksmbd: do not sign response to session request for guest login ksmbd: downgrade ndr version error message to debug ksmbd: limit pdu length size according to connection status ovl: fail on invalid uid/gid mapping at copy up KVM: x86/vmx: Do not skip segment attributes if unusable bit is set KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation thermal: intel: int340x: Protect trip temperature from concurrent updates ipv6: fix reachability confirmation with proxy_ndp ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment EDAC/device: Respect any driver-supplied workqueue polling value EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info net: mana: Fix IRQ name - add PCI and queue number scsi: ufs: core: Fix devfreq deadlocks i2c: designware: use casting of u64 in clock multiplication to avoid overflow netlink: prevent potential spectre v1 gadgets net: fix UaF in netns ops registration error path drm/i915/selftest: fix intel_selftest_modify_policy argument types netfilter: nft_set_rbtree: Switch to node list walk for overlap detection netfilter: nft_set_rbtree: skip elements in transaction from garbage collection netlink: annotate data races around nlk->portid netlink: annotate data races around dst_portid and dst_group netlink: annotate data races around sk_state ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() ipv4: prevent potential spectre v1 gadget in fib_metrics_match() netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE netrom: Fix use-after-free of a listening socket. net/sched: sch_taprio: do not schedule in taprio_reset() sctp: fail if no bound addresses can be used for a given scope riscv/kprobe: Fix instruction simulation of JALR nvme: fix passthrough csi check gpio: mxc: Unlock on error path in mxc_flip_edge() ravb: Rename "no_ptp_cfg_active" and "ptp_cfg_active" variables net: ravb: Fix lack of register setting after system resumed for Gen3 net: ravb: Fix possible hang if RIS2_QFF1 happen net: mctp: mark socks as dead on unhash, prevent re-add thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type() net/tg3: resolve deadlock in tg3_reset_task() during EEH net: mdio-mux-meson-g12a: force internal PHY off on mux switch treewide: fix up files incorrectly marked executable tools: gpio: fix -c option of gpio-event-mon Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode" cpufreq: Move to_gov_attr_set() to cpufreq.h cpufreq: governor: Use kobject release() method to free dbs_data kbuild: Allow kernel installation packaging to override pkg-config block: fix and cleanup bio_check_ro x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL netfilter: conntrack: unify established states for SCTP paths perf/x86/amd: fix potential integer overflow on shift of a int Linux 5.15.91 Change-Id: I1a0a227ff3f034e4a07501ebbd97458fb1fec818 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
c53acbf2fa |
net/sched: sch_taprio: fix possible use-after-free
[ Upstream commit 3a415d59c1dbec9d772dbfab2d2520d98360caae ]
syzbot reported a nasty crash [1] in net_tx_action() which
made little sense until we got a repro.
This repro installs a taprio qdisc, but providing an
invalid TCA_RATE attribute.
qdisc_create() has to destroy the just initialized
taprio qdisc, and taprio_destroy() is called.
However, the hrtimer used by taprio had already fired,
therefore advance_sched() called __netif_schedule().
Then net_tx_action was trying to use a destroyed qdisc.
We can not undo the __netif_schedule(), so we must wait
until one cpu serviced the qdisc before we can proceed.
Many thanks to Alexander Potapenko for his help.
[1]
BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
__raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
_raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
spin_trylock include/linux/spinlock.h:359 [inline]
qdisc_run_begin include/net/sch_generic.h:187 [inline]
qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
__do_softirq+0x1cc/0x7fb kernel/softirq.c:571
run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
kthread+0x31b/0x430 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
Uninit was created at:
slab_post_alloc_hook mm/slab.h:732 [inline]
slab_alloc_node mm/slub.c:3258 [inline]
__kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
alloc_skb include/linux/skbuff.h:1257 [inline]
nlmsg_new include/net/netlink.h:953 [inline]
netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
__sys_sendmsg net/socket.c:2565 [inline]
__do_sys_sendmsg net/socket.c:2574 [inline]
__se_sys_sendmsg net/socket.c:2572 [inline]
__x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Fixes:
|
||
|
|
eaa46dd972 |
Merge 5.15.76 into android13-5.15-lts
Changes in 5.15.76
r8152: add PID for the Lenovo OneLink+ Dock
arm64/mm: Consolidate TCR_EL1 fields
usb: gadget: uvc: consistently use define for headerlen
usb: gadget: uvc: use on returned header len in video_encode_isoc_sg
usb: gadget: uvc: rework uvcg_queue_next_buffer to uvcg_complete_buffer
usb: gadget: uvc: giveback vb2 buffer on req complete
usb: gadget: uvc: improve sg exit condition
arm64: errata: Remove AES hwcap for COMPAT tasks
perf/x86/intel/pt: Relax address filter validation
btrfs: enhance unsupported compat RO flags handling
ocfs2: clear dinode links count in case of error
ocfs2: fix BUG when iput after ocfs2_mknod fails
selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context()
cpufreq: qcom: fix writes in read-only memory region
i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter
x86/microcode/AMD: Apply the patch early on every logical thread
hwmon/coretemp: Handle large core ID value
ata: ahci-imx: Fix MODULE_ALIAS
ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
x86/resctrl: Fix min_cbm_bits for AMD
cpufreq: qcom: fix memory leak in error path
drm/amdgpu: fix sdma doorbell init ordering on APUs
mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
kvm: Add support for arch compat vm ioctls
KVM: arm64: vgic: Fix exit condition in scan_its_table()
media: ipu3-imgu: Fix NULL pointer dereference in active selection access
media: mceusb: set timeout to at least timeout provided
media: venus: dec: Handle the case where find_format fails
x86/topology: Fix multiple packages shown on a single-package system
x86/topology: Fix duplicated core ID within a package
btrfs: fix processing of delayed data refs during backref walking
btrfs: fix processing of delayed tree block refs during backref walking
drm/vc4: Add module dependency on hdmi-codec
ACPI: extlog: Handle multiple records
tipc: Fix recognition of trial period
tipc: fix an information leak in tipc_topsrv_kern_subscr
i40e: Fix DMA mappings leak
HID: magicmouse: Do not set BTN_MOUSE on double report
sfc: Change VF mac via PF as first preference if available.
net/atm: fix proc_mpc_write incorrect return value
net: phy: dp83867: Extend RX strap quirk for SGMII mode
net: phylink: add mac_managed_pm in phylink_config structure
scsi: lpfc: Fix memory leak in lpfc_create_port()
udp: Update reuse->has_conns under reuseport_lock.
cifs: Fix xid leak in cifs_create()
cifs: Fix xid leak in cifs_copy_file_range()
cifs: Fix xid leak in cifs_flock()
cifs: Fix xid leak in cifs_ses_add_channel()
dm: remove unnecessary assignment statement in alloc_dev()
net: hsr: avoid possible NULL deref in skb_clone()
ionic: catch NULL pointer issue on reconfig
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
nvme-hwmon: consistently ignore errors from nvme_hwmon_init
nvme-hwmon: kmalloc the NVME SMART log buffer
nvmet: fix workqueue MEM_RECLAIM flushing dependency
net: sched: cake: fix null pointer access issue when cake_init() fails
net: sched: delete duplicate cleanup of backlog and qlen
net: sched: sfb: fix null pointer access issue when sfb_init() fails
sfc: include vport_id in filter spec hash and equal()
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
net: hns: fix possible memory leak in hnae_ae_register()
net: sched: fix race condition in qdisc_graft()
net: phy: dp83822: disable MDI crossover status change interrupt
iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check()
iommu/vt-d: Clean up si_domain in the init_dmars() error path
fs: dlm: fix invalid derefence of sb_lvbptr
arm64: mte: move register initialization to C
ksmbd: handle smb2 query dir request for OutputBufferLength that is too small
ksmbd: fix incorrect handling of iterate_dir
tracing: Simplify conditional compilation code in tracing_set_tracer()
tracing: Do not free snapshot if tracer is on cmdline
mmc: sdhci-tegra: Use actual clock rate for SW tuning correction
perf: Skip and warn on unknown format 'configN' attrs
ACPI: video: Force backlight native for more TongFang devices
x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB
Makefile.debug: re-enable debug info for .S files
mmc: core: Add SD card quirk for broken discard
mm: /proc/pid/smaps_rollup: fix no vma's null-deref
Linux 5.15.76
Change-Id: I7015bfc94dfd69b9ab2e83d4b20860f13a6c4be6
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
|
||
|
|
34f2a4eedc |
net: sched: delete duplicate cleanup of backlog and qlen
[ Upstream commit c19d893fbf3f2f8fa864ae39652c7fee939edde2 ] qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog _after_ calling qdisc->ops->reset. There is no need to clear them again in the specific reset function. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 2a3fc78210b9 ("net: sched: sfb: fix null pointer access issue when sfb_init() fails") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
f3a0b5d245 |
Merge 5.15.47 into android13-5.15-lts
Changes in 5.15.47
pcmcia: db1xxx_ss: restrict to MIPS_DB1XXX boards
staging: greybus: codecs: fix type confusion of list iterator variable
iio: adc: ad7124: Remove shift from scan_type
lkdtm/bugs: Check for the NULL pointer after calling kmalloc
lkdtm/bugs: Don't expect thread termination without CONFIG_UBSAN_TRAP
tty: goldfish: Use tty_port_destroy() to destroy port
tty: serial: owl: Fix missing clk_disable_unprepare() in owl_uart_probe
tty: n_tty: Restore EOF push handling behavior
serial: 8250_aspeed_vuart: Fix potential NULL dereference in aspeed_vuart_probe
tty: serial: fsl_lpuart: fix potential bug when using both of_alias_get_id and ida_simple_get
remoteproc: imx_rproc: Ignore create mem entry for resource table
usb: usbip: fix a refcount leak in stub_probe()
usb: usbip: add missing device lock on tweak configuration cmd
USB: storage: karma: fix rio_karma_init return
usb: musb: Fix missing of_node_put() in omap2430_probe
staging: fieldbus: Fix the error handling path in anybuss_host_common_probe()
pwm: lp3943: Fix duty calculation in case period was clamped
pwm: raspberrypi-poe: Fix endianness in firmware struct
rpmsg: qcom_smd: Fix irq_of_parse_and_map() return value
usb: dwc3: gadget: Replace list_for_each_entry_safe() if using giveback
usb: dwc3: pci: Fix pm_runtime_get_sync() error checking
misc: fastrpc: fix an incorrect NULL check on list iterator
firmware: stratix10-svc: fix a missing check on list iterator
usb: typec: mux: Check dev_set_name() return value
rpmsg: virtio: Fix possible double free in rpmsg_probe()
rpmsg: virtio: Fix possible double free in rpmsg_virtio_add_ctrl_dev()
rpmsg: virtio: Fix the unregistration of the device rpmsg_ctrl
iio: adc: stmpe-adc: Fix wait_for_completion_timeout return value check
iio: proximity: vl53l0x: Fix return value check of wait_for_completion_timeout
iio: adc: sc27xx: fix read big scale voltage not right
iio: adc: sc27xx: Fine tune the scale calibration values
rpmsg: qcom_smd: Fix returning 0 if irq_of_parse_and_map() fails
pvpanic: Fix typos in the comments
misc/pvpanic: Convert regular spinlock into trylock on panic path
phy: qcom-qmp: fix pipe-clock imbalance on power-on failure
power: supply: axp288_fuel_gauge: Drop BIOS version check from "T3 MRD" DMI quirk
serial: sifive: Report actual baud base rather than fixed 115200
export: fix string handling of namespace in EXPORT_SYMBOL_NS
soundwire: intel: prevent pm_runtime resume prior to system suspend
coresight: cpu-debug: Replace mutex with mutex_trylock on panic notifier
ksmbd: fix reference count leak in smb_check_perm_dacl()
extcon: ptn5150: Add queue work sync before driver release
soc: rockchip: Fix refcount leak in rockchip_grf_init
clocksource/drivers/riscv: Events are stopped during CPU suspend
ARM: dts: aspeed: ast2600-evb: Enable RX delay for MAC0/MAC1
rtc: mt6397: check return value after calling platform_get_resource()
rtc: ftrtc010: Use platform_get_irq() to get the interrupt
rtc: ftrtc010: Fix error handling in ftrtc010_rtc_probe
staging: r8188eu: add check for kzalloc
tty: n_gsm: Don't ignore write return value in gsmld_output()
tty: n_gsm: Fix packet data hex dump output
serial: meson: acquire port->lock in startup()
serial: 8250_fintek: Check SER_RS485_RTS_* only with RS485
serial: cpm_uart: Fix build error without CONFIG_SERIAL_CPM_CONSOLE
serial: digicolor-usart: Don't allow CS5-6
serial: rda-uart: Don't allow CS5-6
serial: txx9: Don't allow CS5-6
serial: sh-sci: Don't allow CS5-6
serial: sifive: Sanitize CSIZE and c_iflag
serial: st-asc: Sanitize CSIZE and correct PARENB for CS7
serial: stm32-usart: Correct CSIZE, bits, and parity
firmware: dmi-sysfs: Fix memory leak in dmi_sysfs_register_handle
bus: ti-sysc: Fix warnings for unbind for serial
driver: base: fix UAF when driver_attach failed
driver core: fix deadlock in __device_attach
watchdog: rti-wdt: Fix pm_runtime_get_sync() error checking
watchdog: ts4800_wdt: Fix refcount leak in ts4800_wdt_probe
blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx
ASoC: fsl_sai: Fix FSL_SAI_xDR/xFR definition
clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog
net: ethernet: mtk_eth_soc: out of bounds read in mtk_hwlro_get_fdir_entry()
net: ethernet: ti: am65-cpsw-nuss: Fix some refcount leaks
net: dsa: mv88e6xxx: Fix refcount leak in mv88e6xxx_mdios_register
modpost: fix removing numeric suffixes
jffs2: fix memory leak in jffs2_do_fill_super
ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
ubi: ubi_create_volume: Fix use-after-free when volume creation failed
selftests/bpf: fix selftest after random: Urandom_read tracepoint removal
selftests/bpf: fix stacktrace_build_id with missing kprobe/urandom_read
bpf: Fix probe read error in ___bpf_prog_run()
block: take destination bvec offsets into account in bio_copy_data_iter
riscv: read-only pages should not be writable
net/smc: fixes for converting from "struct smc_cdc_tx_pend **" to "struct smc_wr_tx_pend_priv *"
tcp: add accessors to read/set tp->snd_cwnd
nfp: only report pause frame configuration for physical device
sfc: fix considering that all channels have TX queues
sfc: fix wrong tx channel offset with efx_separate_tx_channels
block: make bioset_exit() fully resilient against being called twice
vdpa: Fix error logic in vdpa_nl_cmd_dev_get_doit
virtio: pci: Fix an error handling path in vp_modern_probe()
net/mlx5: Don't use already freed action pointer
net/mlx5e: TC NIC mode, fix tc chains miss table
net/mlx5: CT: Fix header-rewrite re-use for tupels
net/mlx5: correct ECE offset in query qp output
net/mlx5e: Update netdev features after changing XDP state
net: sched: add barrier to fix packet stuck problem for lockless qdisc
tcp: tcp_rtx_synack() can be called from process context
vdpa: ifcvf: set pci driver data in probe
octeontx2-af: fix error code in is_valid_offset()
s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag
regulator: mt6315-regulator: fix invalid allowed mode
gpio: pca953x: use the correct register address to do regcache sync
afs: Fix infinite loop found by xfstest generic/676
scsi: sd: Fix potential NULL pointer dereference
tipc: check attribute length for bearer name
driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction
perf c2c: Fix sorting in percent_rmt_hitm_cmp()
dmaengine: idxd: set DMA_INTERRUPT cap bit
mips: cpc: Fix refcount leak in mips_cpc_default_phys_base
bootconfig: Make the bootconfig.o as a normal object file
tracing: Make tp_printk work on syscall tracepoints
tracing: Fix sleeping function called from invalid context on RT kernel
tracing: Avoid adding tracer option before update_tracer_options
iommu/arm-smmu: fix possible null-ptr-deref in arm_smmu_device_probe()
iommu/arm-smmu-v3: check return value after calling platform_get_resource()
f2fs: remove WARN_ON in f2fs_is_valid_blkaddr
i2c: cadence: Increase timeout per message if necessary
m68knommu: set ZERO_PAGE() to the allocated zeroed page
m68knommu: fix undefined reference to `_init_sp'
dmaengine: zynqmp_dma: In struct zynqmp_dma_chan fix desc_size data type
NFSv4: Don't hold the layoutget locks across multiple RPC calls
video: fbdev: hyperv_fb: Allow resolutions with size > 64 MB for Gen1
video: fbdev: pxa3xx-gcu: release the resources correctly in pxa3xx_gcu_probe/remove()
RISC-V: use memcpy for kexec_file mode
m68knommu: fix undefined reference to `mach_get_rtc_pll'
f2fs: fix to tag gcing flag on page during file defragment
xprtrdma: treat all calls not a bcall when bc_serv is NULL
drm/bridge: sn65dsi83: Fix an error handling path in sn65dsi83_probe()
drm/bridge: ti-sn65dsi83: Handle dsi_lanes == 0 as invalid
netfilter: nat: really support inet nat without l3 address
netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path
netfilter: nf_tables: delete flowtable hooks via transaction list
powerpc/kasan: Force thread size increase with KASAN
SUNRPC: Trap RDMA segment overflows
netfilter: nf_tables: always initialize flowtable hook list in transaction
ata: pata_octeon_cf: Fix refcount leak in octeon_cf_probe
netfilter: nf_tables: release new hooks on unsupported flowtable flags
netfilter: nf_tables: memleak flow rule from commit path
netfilter: nf_tables: bail out early if hardware offload is not supported
xen: unexport __init-annotated xen_xlate_map_ballooned_pages()
stmmac: intel: Fix an error handling path in intel_eth_pci_probe()
af_unix: Fix a data-race in unix_dgram_peer_wake_me().
bpf, arm64: Clear prog->jited_len along prog->jited
net: dsa: lantiq_gswip: Fix refcount leak in gswip_gphy_fw_list
net/mlx4_en: Fix wrong return value on ioctl EEPROM query failure
i40e: xsk: Move tmp desc array from driver to pool
xsk: Fix handling of invalid descriptors in XSK TX batching API
SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
net: mdio: unexport __init-annotated mdio_bus_init()
net: xfrm: unexport __init-annotated xfrm4_protocol_init()
net: ipv6: unexport __init-annotated seg6_hmac_init()
net/mlx5: Lag, filter non compatible devices
net/mlx5: Fix mlx5_get_next_dev() peer device matching
net/mlx5: Rearm the FW tracer after each tracer event
net/mlx5: fs, fail conflicting actions
ip_gre: test csum_start instead of transport header
net: altera: Fix refcount leak in altera_tse_mdio_create
net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
tcp: use alloc_large_system_hash() to allocate table_perturb
drm: imx: fix compiler warning with gcc-12
nfp: flower: restructure flow-key for gre+vlan combination
iov_iter: Fix iter_xarray_get_pages{,_alloc}()
iio: dummy: iio_simple_dummy: check the return value of kstrdup()
staging: rtl8712: fix a potential memory leak in r871xu_drv_init()
iio: st_sensors: Add a local lock for protecting odr
lkdtm/usercopy: Expand size of "out of frame" object
drivers: staging: rtl8723bs: Fix deadlock in rtw_surveydone_event_callback()
drivers: staging: rtl8192bs: Fix deadlock in rtw_joinbss_event_prehandle()
tty: synclink_gt: Fix null-pointer-dereference in slgt_clean()
tty: Fix a possible resource leak in icom_probe
thunderbolt: Use different lane for second DisplayPort tunnel
drivers: staging: rtl8192u: Fix deadlock in ieee80211_beacons_stop()
drivers: staging: rtl8192e: Fix deadlock in rtllib_beacons_stop()
USB: host: isp116x: check return value after calling platform_get_resource()
drivers: tty: serial: Fix deadlock in sa1100_set_termios()
drivers: usb: host: Fix deadlock in oxu_bus_suspend()
USB: hcd-pci: Fully suspend across freeze/thaw cycle
char: xillybus: fix a refcount leak in cleanup_dev()
sysrq: do not omit current cpu when showing backtrace of all active CPUs
usb: dwc2: gadget: don't reset gadget's driver->bus
soundwire: qcom: adjust autoenumeration timeout
misc: rtsx: set NULL intfdata when probe fails
extcon: Fix extcon_get_extcon_dev() error handling
extcon: Modify extcon device to be created after driver data is set
clocksource/drivers/sp804: Avoid error on multiple instances
staging: rtl8712: fix uninit-value in usb_read8() and friends
staging: rtl8712: fix uninit-value in r871xu_drv_init()
serial: msm_serial: disable interrupts in __msm_console_write()
kernfs: Separate kernfs_pr_cont_buf and rename_lock.
watchdog: wdat_wdt: Stop watchdog when rebooting the system
md: protect md_unregister_thread from reentrancy
scsi: myrb: Fix up null pointer access on myrb_cleanup()
Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process"
ceph: allow ceph.dir.rctime xattr to be updatable
ceph: flush the mdlog for filesystem sync
drm/amd/display: Check if modulo is 0 before dividing.
drm/radeon: fix a possible null pointer dereference
drm/amd/pm: Fix missing thermal throttler status
um: line: Use separate IRQs per line
modpost: fix undefined behavior of is_arm_mapping_symbol()
x86/cpu: Elide KCSAN for cpu_has() and friends
jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds
nbd: call genl_unregister_family() first in nbd_cleanup()
nbd: fix race between nbd_alloc_config() and module removal
nbd: fix io hung while disconnecting device
s390/gmap: voluntarily schedule during key setting
cifs: version operations for smb20 unneeded when legacy support disabled
drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate
nodemask: Fix return values to be unsigned
vringh: Fix loop descriptors check in the indirect cases
scripts/gdb: change kernel config dumping method
ALSA: usb-audio: Skip generic sync EP parse for secondary EP
ALSA: usb-audio: Set up (implicit) sync for Saffire 6
ALSA: hda/conexant - Fix loopback issue with CX20632
ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo Yoga DuetITL 2021
ALSA: hda/realtek: Add quirk for HP Dev One
cifs: return errors during session setup during reconnects
cifs: fix reconnect on smb3 mount types
KEYS: trusted: tpm2: Fix migratable logic
ata: libata-transport: fix {dma|pio|xfer}_mode sysfs files
mmc: block: Fix CQE recovery reset success
net: phy: dp83867: retrigger SGMII AN when link change
net: openvswitch: fix misuse of the cached connection on tuple changes
writeback: Fix inode->i_io_list not be protected by inode->i_lock error
nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
ixgbe: fix bcast packets Rx on VF after promisc removal
ixgbe: fix unexpected VLAN Rx in promisc mode on VF
Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flag
vduse: Fix NULL pointer dereference on sysfs access
powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK
drm/bridge: analogix_dp: Support PSR-exit to disable transition
drm/atomic: Force bridge self-refresh-exit on CRTC switch
drm/amdgpu: update VCN codec support for Yellow Carp
powerpc/32: Fix overread/overwrite of thread_struct via ptrace
powerpc/mm: Switch obsolete dssall to .long
drm/ast: Create threshold values for AST2600
random: avoid checking crng_ready() twice in random_init()
random: mark bootloader randomness code as __init
random: account for arch randomness in bits
md/raid0: Ignore RAID0 layout if the second zone has only one device
net/sched: act_police: more accurate MTU policing
PCI: qcom: Fix pipe clock imbalance
zonefs: fix handling of explicit_open option on mount
iov_iter: fix build issue due to possible type mis-match
dmaengine: idxd: add missing callback function to support DMA_INTERRUPT
tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd
xsk: Fix possible crash when multiple sockets are created
Linux 5.15.47
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4f53567cd8a0a13927a16f41a2be8bc0db21ce5b
|
||
|
|
f7ca1989fd |
net: sched: add barrier to fix packet stuck problem for lockless qdisc
[ Upstream commit 2e8728c955ce0624b958eee6e030a37aca3a5d86 ]
In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.
The concurrent operations can be described as below,
CPU 0 | CPU 1
qdisc_run_end() | qdisc_run_begin()
. | .
----> /* may be reorderd here */ | .
| . | .
| spin_unlock() | set_bit()
| . | smp_mb__after_atomic()
---- test_bit() | spin_trylock()
. | .
Consider the following sequence of events:
CPU 0 reorder test_bit() ahead and see MISSED = 0
CPU 1 calls set_bit()
CPU 1 calls spin_trylock() and return fail
CPU 0 executes spin_unlock()
At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.
This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.
Fixes:
|
||
|
|
1e853f235a |
net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog
[ Upstream commit a54ce3703613e41fe1d98060b62ec09a3984dc28 ]
In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :
if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))
to be executed before an enqueue operation called before
qdisc_run_begin().
As a result the following race can happen :
CPU 1 CPU 2
qdisc_run_begin() qdisc_run_begin() /* true */
set(MISSED) .
/* returns false */ .
. /* sees MISSED = 1 */
. /* so qdisc not empty */
. __qdisc_run()
. .
. pfifo_fast_dequeue()
----> /* may be done here */ .
| . clear(MISSED)
| . .
| . smp_mb __after_atomic();
| . .
| . /* recheck the queue */
| . /* nothing => exit */
| enqueue(skb1)
| .
| qdisc_run_begin()
| .
| spin_trylock() /* fail */
| .
| smp_mb__before_atomic() /* not enough */
| .
---- if (test_bit(MISSED))
return false; /* exit */
In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().
CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.
Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.
At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.
We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.
Fixes:
|
||
|
|
521f2e62a3 |
ANDROID: add kabi padding for structures for the android13 release
There are a lot of different structures that need to have a "frozen" abi for the next 5+ years. Add padding to a lot of them in order to be able to handle any future changes that might be needed due to LTS and security fixes that might come up. It's a best guess, based on what has happened in the past from the 5.10.0..5.10.110 release (1 1/2 years). Yes, past changes do not mean that future changes will also be needed in the same area, but that is a hint that those areas are both well maintained and looked after, and there have been previous problems found in them. Also the list of structures that are being required based on OEM usage in the android/ symbol lists were consulted as that's a larger list than what has been changed in the past. Hopefully we caught everything we need to worry about, only time will tell... Bug: 151154716 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I880bbcda0628a7459988eeb49d18655522697664 |
||
|
|
b4e6455d7f |
net_sched: restore "mpu xxx" handling
commit fb80445c438c78b40b547d12b8d56596ce4ccfeb upstream. commit |
||
|
|
0d76daf201 |
net/sched: Extend qdisc control block with tc control block
[ Upstream commit ec624fe740b416fb68d536b37fb8eef46f90b5c2 ] BPF layer extends the qdisc control block via struct bpf_skb_data_end and because of that there is no more room to add variables to the qdisc layer control block without going over the skb->cb size. Extend the qdisc control block with a tc control block, and move all tc related variables to there as a pre-step for extending the tc control block with additional members. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
ae11b215ae |
net: sched: update default qdisc visibility after Tx queue cnt changes
[ Upstream commit 1e080f17750d1083e8a32f7b350584ae1cd7ff20 ]
mq / mqprio make the default child qdiscs visible. They only do
so for the qdiscs which are within real_num_tx_queues when the
device is registered. Depending on order of calls in the driver,
or if user space changes config via ethtool -L the number of
qdiscs visible under tc qdisc show will differ from the number
of queues. This is confusing to users and potentially to system
configuration scripts which try to make sure qdiscs have the
right parameters.
Add a new Qdisc_ops callback and make relevant qdiscs TTRT.
Note that this uncovers the "shortcut" created by
commit
|
||
|
|
695176bfe5 |
net_sched: refactor TC action init API
TC action ->init() API has 10 parameters, it becomes harder to read. Some of them are just boolean and can be replaced by flags. Similarly for the internal API tcf_action_init() and tcf_exts_validate(). This patch converts them to flags and fold them into the upper 16 bits of "flags", whose lower 16 bits are still reserved for user-space. More specifically, the following kernel flags are introduced: TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to distinguish whether it is compatible with policer. TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether this action is bound to a filter. TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts, means we are replacing an existing action. TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the opposite meaning, because we still hold RTNL in most cases. The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is untouched and still stored as before. I have tested this patch with tdc and I do not see any failure related to this patch. Tested-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
b6df00789e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflict in net/netfilter/nf_tables_api.c. Duplicate fix in tools/testing/selftests/net/devlink_port_split.py - take the net-next version. skmsg, and L4 bpf - keep the bpf code but remove the flags and err params. Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
d3e0f57501 |
net: sched: remove qdisc->empty for lockless qdisc
As MISSED and DRAINING state are used to indicate a non-empty qdisc, qdisc->empty is not longer needed, so remove it. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
c4fef01ba4 |
net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc
Currently pfifo_fast has both TCQ_F_CAN_BYPASS and TCQ_F_NOLOCK
flag set, but queue discipline by-pass does not work for lockless
qdisc because skb is always enqueued to qdisc even when the qdisc
is empty, see __dev_xmit_skb().
This patch calls sch_direct_xmit() to transmit the skb directly
to the driver for empty lockless qdisc, which aviod enqueuing
and dequeuing operation.
As qdisc->empty is not reliable to indicate a empty qdisc because
there is a time window between enqueuing and setting qdisc->empty.
So we use the MISSED state added in commit
|
||
|
|
dd25296afa |
net: sched: avoid unnecessary seqcount operation for lockless qdisc
qdisc->running seqcount operation is mainly used to do heuristic locking on q->busylock for locked qdisc, see qdisc_is_running() and __dev_xmit_skb(). So avoid doing seqcount operation for qdisc with TCQ_F_NOLOCK flag. Acked-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # flexcan Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
89837eb4b2 |
net: sched: add barrier to ensure correct ordering for lockless qdisc
The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit |
||
|
|
a90c57f2ce |
net: sched: fix packet stuck problem for lockless qdisc
Lockless qdisc has below concurrent problem:
cpu0 cpu1
. .
q->enqueue .
. .
qdisc_run_begin() .
. .
dequeue_skb() .
. .
sch_direct_xmit() .
. .
. q->enqueue
. qdisc_run_begin()
. return and do nothing
. .
qdisc_run_end() .
cpu1 enqueue a skb without calling __qdisc_run() because cpu0
has not released the lock yet and spin_trylock() return false
for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
enqueued by cpu1 when calling dequeue_skb() because cpu1 may
enqueue the skb after cpu0 calling dequeue_skb() and before
cpu0 calling qdisc_run_end().
Lockless qdisc has below another concurrent problem when
tx_action is involved:
cpu0(serving tx_action) cpu1 cpu2
. . .
. q->enqueue .
. qdisc_run_begin() .
. dequeue_skb() .
. . q->enqueue
. . .
. sch_direct_xmit() .
. . qdisc_run_begin()
. . return and do nothing
. . .
clear __QDISC_STATE_SCHED . .
qdisc_run_begin() . .
return and do nothing . .
. . .
. qdisc_run_end() .
This patch fixes the above data race by:
1. If the first spin_trylock() return false and STATE_MISSED is
not set, set STATE_MISSED and retry another spin_trylock() in
case other CPU may not see STATE_MISSED after it releases the
lock.
2. reschedule if STATE_MISSED is set after the lock is released
at the end of qdisc_run_end().
For tx_action case, STATE_MISSED is also set when cpu1 is at the
end if qdisc_run_end(), so tx_action will be rescheduled again
to dequeue the skb enqueued by cpu2.
Clear STATE_MISSED before retrying a dequeuing when dequeuing
returns NULL in order to reduce the overhead of the second
spin_trylock() and __netif_schedule() calling.
Also clear the STATE_MISSED before calling __netif_schedule()
at the end of qdisc_run_end() to avoid doing another round of
dequeuing in the pfifo_fast_dequeue().
The performance impact of this patch, tested using pktgen and
dummy netdev with pfifo_fast qdisc attached:
threads without+this_patch with+this_patch delta
1 2.61Mpps 2.60Mpps -0.3%
2 3.97Mpps 3.82Mpps -3.7%
4 5.62Mpps 5.59Mpps -0.5%
8 2.78Mpps 2.77Mpps -0.3%
16 2.22Mpps 2.22Mpps -0.0%
Fixes:
|
||
|
|
2ffe039528 |
net/sched: act_police: add support for packet-per-second policing
Allow a policer action to enforce a rate-limit based on packets-per-second,
configurable using a packet-per-second rate and burst parameters.
e.g.
tc filter add dev tap1 parent ffff: u32 match \
u32 0 0 police pkts_rate 3000 pkts_burst 1000
Testing was unable to uncover a performance impact of this change on
existing features.
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
||
|
|
d1e1355aef |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
938e0fcd32 |
net: sched: replaced invalid qdisc tree flush helper in qdisc_replace
Commit |
||
|
|
4dd78a7373 |
net: sched: Add extack to Qdisc_class_ops.delete
In a following commit, sch_htb will start using extack in the delete class operation to pass hardware errors in offload mode. This commit prepares for that by adding the extack parameter to this callback and converting usage of the existing qdiscs. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
ca1e4ab199 |
net: sched: Add multi-queue support to sch_tree_lock
The existing qdiscs that set TCQ_F_MQROOT don't use sch_tree_lock. However, hardware-offloaded HTB will start setting this flag while also using sch_tree_lock. The current implementation of sch_tree_lock basically locks on qdisc->dev_queue->qdisc, and it works fine when the tree is attached to some queue. However, it's not the case for MQROOT qdiscs: such a qdisc is the root itself, and its dev_queue just points to queue 0, while not actually being used, because there are real per-queue qdiscs. This patch changes the logic of sch_tree_lock and sch_tree_unlock to lock the qdisc itself if it's the MQROOT. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
7baf2429a1 |
net/sched: cls_flower add CT_FLAGS_INVALID flag support
This patch add the TCA_FLOWER_KEY_CT_FLAGS_INVALID flag to match the ct_state with invalid for conntrack. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/1611045110-682-1-git-send-email-wenxu@ucloud.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
d635a69dd4 |
Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- support "prefer busy polling" NAPI operation mode, where we defer
softirq for some time expecting applications to periodically busy
poll
- AF_XDP: improve efficiency by more batching and hindering the
adjacency cache prefetcher
- af_packet: make packet_fanout.arr size configurable up to 64K
- tcp: optimize TCP zero copy receive in presence of partial or
unaligned reads making zero copy a performance win for much smaller
messages
- XDP: add bulk APIs for returning / freeing frames
- sched: support fragmenting IP packets as they come out of conntrack
- net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs
BPF:
- BPF switch from crude rlimit-based to memcg-based memory accounting
- BPF type format information for kernel modules and related tracing
enhancements
- BPF implement task local storage for BPF LSM
- allow the FENTRY/FEXIT/RAW_TP tracing programs to use
bpf_sk_storage
Protocols:
- mptcp: improve multiple xmit streams support, memory accounting and
many smaller improvements
- TLS: support CHACHA20-POLY1305 cipher
- seg6: add support for SRv6 End.DT4/DT6 behavior
- sctp: Implement RFC 6951: UDP Encapsulation of SCTP
- ppp_generic: add ability to bridge channels directly
- bridge: Connectivity Fault Management (CFM) support as is defined
in IEEE 802.1Q section 12.14.
Drivers:
- mlx5: make use of the new auxiliary bus to organize the driver
internals
- mlx5: more accurate port TX timestamping support
- mlxsw:
- improve the efficiency of offloaded next hop updates by using
the new nexthop object API
- support blackhole nexthops
- support IEEE 802.1ad (Q-in-Q) bridging
- rtw88: major bluetooth co-existance improvements
- iwlwifi: support new 6 GHz frequency band
- ath11k: Fast Initial Link Setup (FILS)
- mt7915: dual band concurrent (DBDC) support
- net: ipa: add basic support for IPA v4.5
Refactor:
- a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
Siewior
- phy: add support for shared interrupts; get rid of multiple driver
APIs and have the drivers write a full IRQ handler, slight growth
of driver code should be compensated by the simpler API which also
allows shared IRQs
- add common code for handling netdev per-cpu counters
- move TX packet re-allocation from Ethernet switch tag drivers to a
central place
- improve efficiency and rename nla_strlcpy
- number of W=1 warning cleanups as we now catch those in a patchwork
build bot
Old code removal:
- wan: delete the DLCI / SDLA drivers
- wimax: move to staging
- wifi: remove old WDS wifi bridging support"
* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
net: hns3: fix expression that is currently always true
net: fix proc_fs init handling in af_packet and tls
nfc: pn533: convert comma to semicolon
af_vsock: Assign the vsock transport considering the vsock address flags
af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
vsock_addr: Check for supported flag values
vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
vm_sockets: Add flags field in the vsock address data structure
net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
nfc: s3fwrn5: Release the nfc firmware
net: vxget: clean up sparse warnings
mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
mlxsw: reg: Add Router LPM Cache Enable Register
mlxsw: reg: Add Router LPM Cache ML Delete Register
mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
mlxsw: reg: Add XM Router M Table Register
...
|
||
|
|
c129412f74 |
net/sched: sch_frag: add generic packet fragment support.
Currently kernel tc subsystem can do conntrack in cat_ct. But when several fragment packets go through the act_ct, function tcf_ct_handle_fragments will defrag the packets to a big one. But the last action will redirect mirred to a device which maybe lead the reassembly big packet over the mtu of target device. This patch add support for a xmit hook to mirred, that gets executed before xmiting the packet. Then, when act_ct gets loaded, it configs that hook. The frag xmit hook maybe reused by other modules. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
fa6d639930 |
net/sched: act_mirred: refactor the handle of xmit
This one is prepare for the next patch. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
a72e9d5472 |
net: sched: Remove broken definitions and un-hide for !LOCKDEP
Currently, variables used only within lockdep expressions are flagged as unused, requiring that these variables' declarations be decorated with either #ifdef or __maybe_unused. This results in ugly code. This commit therefore causes the full definitions of the lockdep_tcf_chain_is_locked() and lockdep_tcf_proto_is_locked() functions to be visible even when lockdep is not enabled, thus removing the need for the previous empty functions that were provided in non-lockdep kernels. This approach further relies on dead-code elimination to remove any references to functions or variables that are not available in non-lockdep kernels. Signed-off-by: Jakub Kicinski <kuba@kernel.org> -- CC: jhs@mojatatu.com CC: xiyou.wangcong@gmail.com CC: jiri@resnulli.us Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
|
|
846e463a70 |
net/sched: get rid of qdisc->padded
kmalloc() of sufficiently big portion of memory is cache-aligned in regular conditions. If some debugging options are used, there is no reason qdisc structures would need 64-byte alignment if most other kernel structures are not aligned. This get rid of QDISC_ALIGN and QDISC_ALIGNTO. Addition of privdata field will help implementing the reverse of qdisc_priv() and documents where the private data is. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Allen Pais <allen.lkml@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
|
2b7ea122a0 |
net/sched: Remove unused function qdisc_queue_drop_head()
It is not used since commit
|
||
|
|
038ebb1a71 |
net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct
When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.
"kernel: net2: dropped over-mtu packet: 1528 > 1500"
This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.
Fixes:
|
||
|
|
ac5c66f261 |
Revert "net: sched: Pass root lock to Qdisc_ops.enqueue"
This reverts commit
|
||
|
|
aebe4426cc |
net: sched: Pass root lock to Qdisc_ops.enqueue
A following patch introduces qevents, points in qdisc algorithm where packet can be processed by user-defined filters. Should this processing lead to a situation where a new packet is to be enqueued on the same port, holding the root lock would lead to deadlocks. To solve the issue, qevent handler needs to unlock and relock the root lock when necessary. To that end, add the root lock argument to the qdisc op enqueue, and propagate throughout. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
f8ab1807a9 |
net: sched: introduce terse dump flag
Add new TCA_DUMP_FLAGS attribute and use it in cls API to request terse filter output from classifiers with TCA_DUMP_FLAGS_TERSE flag. This option is intended to be used to improve performance of TC filter dump when userland only needs to obtain stats and not the whole classifier/action data. Extend struct tcf_proto_ops with new terse_dump() callback that must be defined by supporting classifier implementations. Support of the options in specific classifiers and actions is implemented in following patches in the series. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
3793faad7b |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts were all overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
a7df4870d7 |
net_sched: fix tcm_parent in tc filter dump
When we tell kernel to dump filters from root (ffff:ffff), those filters on ingress (ffff:0000) are matched, but their true parents must be dumped as they are. However, kernel dumps just whatever we tell it, that is either ffff:ffff or ffff:0000: $ nl-cls-list --dev=dummy0 --parent=root cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all $ nl-cls-list --dev=dummy0 --parent=ffff: cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all This is confusing and misleading, more importantly this is a regression since 4.15, so the old behavior must be restored. And, when tc filters are installed on a tc class, the parent should be the classid, rather than the qdisc handle. Commit |
||
|
|
7f023ec91c |
net: sched: remove unused inline function qdisc_reset_all_tx
There's no callers in-tree anymore. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
9fb16955fb |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Overlapping header include additions in macsec.c A bug fix in 'net' overlapping with the removal of 'version' string in ena_netdev.c Overlapping test additions in selftests Makefile Overlapping PCI ID table adjustments in iwlwifi driver. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
2c64605b59 |
net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’:
net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’
pkt->skb->tc_redirected = 1;
^~
net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’
pkt->skb->tc_from_ingress = 1;
^~
To avoid a direct dependency with tc actions from netfilter, wrap the
redirect bits around CONFIG_NET_REDIRECT and move helpers to
include/linux/skbuff.h. Turn on this toggle from the ifb driver, the
only existing client of these bits in the tree.
This patch adds skb_set_redirected() that sets on the redirected bit
on the skbuff, it specifies if the packet was redirect from ingress
and resets the timestamp (timestamp reset was originally missing in the
netfilter bugfix).
Fixes:
|
||
|
|
7d17c544cd |
net: sched: Pass ingress block to tcf_classify_ingress
On ingress and cls_act qdiscs init, save the block on ingress mini_Qdisc and and pass it on to ingress classification, so it can be used for the looking up a specified chain index. Co-developed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> |
||
|
|
2e24cd7555 |
net_sched: fix ops->bind_class() implementations
The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().
In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.
Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.
Fixes:
|
||
|
|
a5b72a083d |
net/sched: add delete_empty() to filters and use it in cls_flower
Revert "net/sched: cls_u32: fix refcount leak in the error path of u32_change()", and fix the u32 refcount leak in a more generic way that preserves the semantic of rule dumping. On tc filters that don't support lockless insertion/removal, there is no need to guard against concurrent insertion when a removal is in progress. Therefore, for most of them we can avoid a full walk() when deleting, and just decrease the refcount, like it was done on older Linux kernels. This fixes situations where walk() was wrongly detecting a non-empty filter, like it happened with cls_u32 in the error path of change(), thus leading to failures in the following tdc selftests: 6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev 6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle 74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id On cls_flower, and on (future) lockless filters, this check is necessary: move all the check_empty() logic in a callback so that each filter can have its own implementation. For cls_flower, it's sufficient to check if no IDRs have been allocated. This reverts commit |
||
|
|
14684b9301 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
90b2be27bb |
net/sched: annotate lockless accesses to qdisc->empty
KCSAN reported the following race [1]
BUG: KCSAN: data-race in __dev_queue_xmit / net_tx_action
read to 0xffff8880ba403508 of 1 bytes by task 21814 on cpu 1:
__dev_xmit_skb net/core/dev.c:3389 [inline]
__dev_queue_xmit+0x9db/0x1b40 net/core/dev.c:3761
dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
neigh_hh_output include/net/neighbour.h:500 [inline]
neigh_output include/net/neighbour.h:509 [inline]
ip6_finish_output2+0x873/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
write to 0xffff8880ba403508 of 1 bytes by interrupt on cpu 0:
qdisc_run_begin include/net/sch_generic.h:160 [inline]
qdisc_run include/net/pkt_sched.h:120 [inline]
net_tx_action+0x2b1/0x6c0 net/core/dev.c:4551
__do_softirq+0x115/0x33f kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
do_softirq kernel/softirq.c:329 [inline]
__local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
local_bh_enable include/linux/bottom_half.h:32 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
ip6_finish_output2+0x7bb/0xec0 net/ipv6/ip6_output.c:117
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0x9f/0xc0 net/socket.c:657
___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
__sys_sendmmsg+0x123/0x350 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 21817 Comm: syz-executor.2 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes:
|
||
|
|
59eb87cb52 |
net: sched: prevent duplicate flower rules from tcf_proto destroy race
When a new filter is added to cls_api, the function
tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
determine if the tcf_proto is duplicated in the chain's hashtable. It then
creates a new entry or continues with an existing one. In cls_flower, this
allows the function fl_ht_insert_unque to determine if a filter is a
duplicate and reject appropriately, meaning that the duplicate will not be
passed to drivers via the offload hooks. However, when a tcf_proto is
destroyed it is removed from its chain before a hardware remove hook is
hit. This can lead to a race whereby the driver has not received the
remove message but duplicate flows can be accepted. This, in turn, can
lead to the offload driver receiving incorrect duplicate flows and out of
order add/delete messages.
Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
hash table per block stores each unique chain/protocol/prio being
destroyed. This entry is only removed when the full destroy (and hardware
offload) has completed. If a new flow is being added with the same
identiers as a tc_proto being detroyed, then the add request is replayed
until the destroy is complete.
Fixes:
|
||
|
|
ef816f3c49 |
net: sched: don't expose action qstats to skb_tc_reinsert()
Previous commit introduced helper function for updating qstats and refactored set of actions to use the helpers, instead of modifying qstats directly. However, one of the affected action exposes its qstats to skb_tc_reinsert(), which then modifies it. Refactor skb_tc_reinsert() to return integer error code and don't increment overlimit qstats in case of error, and use the returned error code in tcf_mirred_act() to manually increment the overlimit counter with new helper function. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
159d2c7d81 |
sch_netem: fix rcu splat in netem_enqueue()
qdisc_root() use from netem_enqueue() triggers a lockdep warning. __dev_queue_xmit() uses rcu_read_lock_bh() which is not equivalent to rcu_read_lock() + local_bh_disable_bh as far as lockdep is concerned. WARNING: suspicious RCU usage 5.3.0-rc7+ #0 Not tainted ----------------------------- include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor427/8855: #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline] #0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214 #1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804 #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline] #2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838 stack backtrace: CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357 qdisc_root include/net/sch_generic.h:492 [inline] netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479 __dev_xmit_skb net/core/dev.c:3527 [inline] __dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838 dev_queue_xmit+0x18/0x20 net/core/dev.c:3902 neigh_hh_output include/net/neighbour.h:500 [inline] neigh_output include/net/neighbour.h:509 [inline] ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:308 [inline] __ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290 ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318 NF_HOOK_COND include/linux/netfilter.h:294 [inline] ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417 dst_output include/net/dst.h:436 [inline] ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125 ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555 udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887 udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174 inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:657 ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311 __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg net/socket.c:2439 [inline] __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439 do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
c9f14470d0 |
net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow registering and unregistering block hardware offload callbacks that do not require caller to hold rtnl lock. Extend tcf_block with additional lockeddevcnt counter that is incremented for each non-unlocked driver callback attached to device. This counter is necessary to conditionally obtain rtnl lock before calling hardware callbacks in following patches. Register mlx5 tc block offload callbacks as "unlocked". Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
a449a3e77a |
net: sched: notify classifier on successful offload add/delete
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
4011921137 |
net: sched: refactor block offloads counter usage
Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.
Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:
tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
already in hardware is successfully offloaded, increment block offloads
counter, set filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be intact and offloads counter is not
decremented.
tc_setup_cb_replace() - destructive filter replace. Release existing
filter block offload counter and reset its in hardware counter and flag.
Set new filter in hardware counter and flag. On failure, previously
offloaded filter is considered to be destroyed and offload counter is
decremented.
tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
offloads counter.
tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
call tc_cls_offload_cnt_update() if cb() didn't return an error.
Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|