Lorenz recently reported:
In our TC classifier cls_redirect [0], we use the following sequence of
helper calls to decapsulate a GUE (basically IP + UDP + custom header)
encapsulated packet:
bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO)
bpf_redirect(skb->ifindex, BPF_F_INGRESS)
It seems like some checksums of the inner headers are not validated in
this case. For example, a TCP SYN packet with invalid TCP checksum is
still accepted by the network stack and elicits a SYN ACK. [...]
That is, we receive the following packet from the driver:
| ETH | IP | UDP | GUE | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading.
On this packet we run skb_adjust_room_mac(-encap_len), and get the following:
| ETH | IP | TCP |
skb->ip_summed == CHECKSUM_UNNECESSARY
Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing
into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is
turned into a no-op due to CHECKSUM_UNNECESSARY.
The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally,
it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does
not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original
skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now
there is no way to adjust the skb->csum_level. NICs that have checksum offload
disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected.
Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and
add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users
from opting out. Opting out is useful for the case where we don't remove/add
full protocol headers, or for the case where a user wants to adjust the csum
level manually e.g. through bpf_csum_level() helper that is added in subsequent
patch.
The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF
bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally
as well but doesn't change layers, only transitions between v4 to v6 and vice
versa, therefore no adoption is required there.
[0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/
Fixes: 2be7e212d5 ("bpf: add bpf_skb_adjust_room helper")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Reported-by: Alan Maguire <alan.maguire@oracle.com>
Change-Id: I11dcdb6780894d2adcb85a6c8f725dc1aadcdc80
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/
Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net
Add new key meta that contains ingress ifindex value and add a function
to dissect this from skb. The key and function is prepared to cover
other potential skb metadata values dissection.
Change-Id: Ifedf5248b9498f59185348e8eb0601cd7c92695e
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to:
(1) attach more than one BPF program type to netns, or
(2) support attaching BPF programs to netns with bpf_link, or
(3) support multi-prog attach points for netns
we will need to keep more state per netns than a single pointer like we
have now for BPF flow dissector program.
Prepare for the above by extracting netns_bpf that is part of struct net,
for storing all state related to BPF programs attached to netns.
Turn flow dissector callbacks for querying/attaching/detaching a program
into generic ones that operate on netns_bpf. Next patch will move the
generic callbacks into their own module.
This is similar to how it is organized for cgroup with cgroup_bpf.
Change-Id: I099b7db1ae38e3c30de6224510c23cff9cfa47ef
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200531082846.2117903-3-jakub@cloudflare.com
C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible. Pass
those flags to the BPF flow dissector so it can make the same
decisions. In the next commits I'll add support for those flags to
our reference bpf_flow.c
v3:
* Export copy of flow dissector flags instead of moving (Alexei Starovoitov)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Change-Id: I46a68f8b2249915fff5d97a1394ea662d9a0ac46
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When called without skb, gather all required data from the
__skb_flow_dissect's arguments and use recently introduces
no-skb mode of bpf flow dissector.
Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users.
Change-Id: Ice7d27b1adaeb9adb02ea52fda999450a6d765af
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.
Change-Id: I07dbe25c77a13f8cc33b037e63c532979d290ad0
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
struct bpf_flow_dissector has a small subset of sk_buff fields that
flow dissector BPF program is allowed to access and an optional
pointer to real skb. Real skb is used only in bpf_skb_load_bytes
helper to read non-linear data.
The real motivation for this is to be able to call flow dissector
from eth_get_headlen context where we don't have an skb and need
to dissect raw bytes.
Change-Id: I5aa984bb223fd2cd05dd0a502f8b4519358124d1
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This way, we can reuse it for flow dissector in BPF_PROG_TEST_RUN.
No functional changes.
Change-Id: I0c832987c51402747a3f985641fee4f9e2981933
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Switch internal TCP skb->skb_mstamp to skb->skb_mstamp_ns,
from usec units to nsec units.
Do not clear skb->tstamp before entering IP stacks in TX,
so that qdisc or devices can implement pacing based on the
earliest departure time instead of socket sk->sk_pacing_rate
Packets are fed with tcp_wstamp_ns, and following patch
will update tcp_wstamp_ns when both TCP and sch_fq switch to
the earliest departure time mechanism.
Change-Id: I543f92dd4fd498cbae41c597b86e8b5b4540deec
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
target_fd is target namespace. If there is a flow dissector BPF program
attached to that namespace, its (single) id is returned.
v5:
* drop net ref right after rcu unlock (Daniel Borkmann)
v4:
* add missing put_net (Jann Horn)
v3:
* add missing inline to skb_flow_dissector_prog_query static def
(kbuild test robot <lkp@intel.com>)
v2:
* don't sleep in rcu critical section (Jakub Kicinski)
* check input prog_cnt (exit early)
Cc: Jann Horn <jannh@google.com>
Change-Id: I0ee56753ab9d515f2369c00b4470051ab2e12322
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If boolean CONFIG_BPF_SYSCALL is enabled, kernel/bpf/syscall.c will
call flow_dissector functions from net/core/flow_dissector.c.
This causes this build failure if CONFIG_NET is disabled:
kernel/bpf/syscall.o: In function `__x64_sys_bpf':
syscall.c:(.text+0x3278): undefined reference to
`skb_flow_dissector_bpf_prog_attach'
syscall.c:(.text+0x3310): undefined reference to
`skb_flow_dissector_bpf_prog_detach'
kernel/bpf/syscall.o:(.rodata+0x3f0): undefined reference to
`flow_dissector_prog_ops'
kernel/bpf/verifier.o:(.rodata+0x250): undefined reference to
`flow_dissector_verifier_ops'
Analogous to other optional BPF program types in syscall.c, add stubs
if the relevant functions are not compiled and move the BPF_PROG_TYPE
definition in the #ifdef CONFIG_NET block.
Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Change-Id: I3c06753658e2d06c1cef5158106ba2fe5ee41b34
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The function build_skb() also have the responsibility to allocate and clear
the SKB structure. Introduce a new function build_skb_around(), that moves
the responsibility of allocation and clearing to the caller. This allows
caller to use kmem_cache (slab/slub) bulk allocation API.
Next patch use this function combined with kmem_cache_alloc_bulk.
Change-Id: I8f0ed067a322551d2877bb42ab1971a49634adcf
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
path. The BPF program is per-network namespace.
Change-Id: I08d14153138c67d7affbddf241171dc8c036279d
Signed-off-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Changes in 4.19.207
ext4: fix race writing to an inline_data file while its xattrs are changing
xtensa: fix kconfig unmet dependency warning for HAVE_FUTEX_CMPXCHG
gpu: ipu-v3: Fix i.MX IPU-v3 offset calculations for (semi)planar U/V formats
qed: Fix the VF msix vectors flow
net: macb: Add a NULL check on desc_ptp
qede: Fix memset corruption
perf/x86/intel/pt: Fix mask of num_address_ranges
perf/x86/amd/ibs: Work around erratum #1197
cryptoloop: add a deprecation warning
ARM: 8918/2: only build return_address() if needed
ALSA: pcm: fix divide error in snd_pcm_lib_ioctl
clk: fix build warning for orphan_list
media: stkwebcam: fix memory leak in stk_camera_probe
ARM: imx: add missing clk_disable_unprepare()
ARM: imx: fix missing 3rd argument in macro imx_mmdc_perf_init
igmp: Add ip_mc_list lock in ip_check_mc_rcu
USB: serial: mos7720: improve OOM-handling in read_mos_reg()
ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2)
SUNRPC/nfs: Fix return value for nfs4_callback_compound()
crypto: talitos - reduce max key size for SEC1
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Delete unneeded .globl _zimage_start
net: ll_temac: Remove left-over debug message
mm/page_alloc: speed up the iteration of max_order
Revert "btrfs: compression: don't try to compress if we don't have enough pages"
ALSA: usb-audio: Add registration quirk for JBL Quantum 800
usb: host: xhci-rcar: Don't reload firmware after the completion
usb: mtu3: use @mult for HS isoc or intr
usb: mtu3: fix the wrong HS mult value
x86/reboot: Limit Dell Optiplex 990 quirk to early BIOS versions
PCI: Call Max Payload Size-related fixup quirks early
locking/mutex: Fix HANDOFF condition
regmap: fix the offset of register error log
crypto: mxs-dcp - Check for DMA mapping errors
sched/deadline: Fix reset_on_fork reporting of DL tasks
power: supply: axp288_fuel_gauge: Report register-address on readb / writeb errors
crypto: omap-sham - clear dma flags only after omap_sham_update_dma_stop()
sched/deadline: Fix missing clock update in migrate_task_rq_dl()
hrtimer: Avoid double reprogramming in __hrtimer_start_range_ns()
udf: Check LVID earlier
isofs: joliet: Fix iocharset=utf8 mount option
bcache: add proper error unwinding in bcache_device_init
nvme-rdma: don't update queue count when failing to set io queues
power: supply: max17042_battery: fix typo in MAx17042_TOFF
s390/cio: add dev_busid sysfs entry for each subchannel
libata: fix ata_host_start()
crypto: qat - do not ignore errors from enable_vf2pf_comms()
crypto: qat - handle both source of interrupt in VF ISR
crypto: qat - fix reuse of completion variable
crypto: qat - fix naming for init/shutdown VF to PF notifications
crypto: qat - do not export adf_iov_putmsg()
fcntl: fix potential deadlock for &fasync_struct.fa_lock
udf_get_extendedattr() had no boundary checks.
m68k: emu: Fix invalid free in nfeth_cleanup()
spi: spi-fsl-dspi: Fix issue with uninitialized dma_slave_config
spi: spi-pic32: Fix issue with uninitialized dma_slave_config
lib/mpi: use kcalloc in mpi_resize
clocksource/drivers/sh_cmt: Fix wrong setting if don't request IRQ for clock source channel
crypto: qat - use proper type for vf_mask
certs: Trigger creation of RSA module signing key if it's not an RSA key
spi: sprd: Fix the wrong WDG_LOAD_VAL
media: TDA1997x: enable EDID support
soc: rockchip: ROCKCHIP_GRF should not default to y, unconditionally
media: dvb-usb: fix uninit-value in dvb_usb_adapter_dvb_init
media: dvb-usb: fix uninit-value in vp702x_read_mac_addr
media: go7007: remove redundant initialization
Bluetooth: sco: prevent information leak in sco_conn_defer_accept()
tcp: seq_file: Avoid skipping sk during tcp_seek_last_pos
net: cipso: fix warnings in netlbl_cipsov4_add_std
i2c: highlander: add IRQ check
media: em28xx-input: fix refcount bug in em28xx_usb_disconnect
media: venus: venc: Fix potential null pointer dereference on pointer fmt
PCI: PM: Avoid forcing PCI_D0 for wakeup reasons inconsistently
PCI: PM: Enable PME if it can be signaled from D3cold
soc: qcom: smsm: Fix missed interrupts if state changes while masked
Bluetooth: increase BTNAMSIZ to 21 chars to fix potential buffer overflow
drm/msm/dpu: make dpu_hw_ctl_clear_all_blendstages clear necessary LMs
arm64: dts: exynos: correct GIC CPU interfaces address range on Exynos7
Bluetooth: fix repeated calls to sco_sock_kill
drm/msm/dsi: Fix some reference counted resource leaks
usb: gadget: udc: at91: add IRQ check
usb: phy: fsl-usb: add IRQ check
usb: phy: twl6030: add IRQ checks
Bluetooth: Move shutdown callback before flushing tx and rx queue
usb: host: ohci-tmio: add IRQ check
usb: phy: tahvo: add IRQ check
mac80211: Fix insufficient headroom issue for AMSDU
usb: gadget: mv_u3d: request_irq() after initializing UDC
Bluetooth: add timeout sanity check to hci_inquiry
i2c: iop3xx: fix deferred probing
i2c: s3c2410: fix IRQ check
mmc: dw_mmc: Fix issue with uninitialized dma_slave_config
mmc: moxart: Fix issue with uninitialized dma_slave_config
CIFS: Fix a potencially linear read overflow
i2c: mt65xx: fix IRQ check
usb: ehci-orion: Handle errors of clk_prepare_enable() in probe
usb: bdc: Fix an error handling path in 'bdc_probe()' when no suitable DMA config is available
tty: serial: fsl_lpuart: fix the wrong mapbase value
ath6kl: wmi: fix an error code in ath6kl_wmi_sync_point()
bcma: Fix memory leak for internally-handled cores
ipv4: make exception cache less predictible
net: sched: Fix qdisc_rate_table refcount leak when get tcf_block failed
net: qualcomm: fix QCA7000 checksum handling
ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
netns: protect netns ID lookups with RCU
fscrypt: add fscrypt_symlink_getattr() for computing st_size
ext4: report correct st_size for encrypted symlinks
f2fs: report correct st_size for encrypted symlinks
ubifs: report correct st_size for encrypted symlinks
tty: Fix data race between tiocsti() and flush_to_ldisc()
x86/resctrl: Fix a maybe-uninitialized build warning treated as error
KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted
IMA: remove -Wmissing-prototypes warning
IMA: remove the dependency on CRYPTO_MD5
fbmem: don't allow too huge resolutions
backlight: pwm_bl: Improve bootloader/kernel device handover
clk: kirkwood: Fix a clocking boot regression
rtc: tps65910: Correct driver module alias
btrfs: reset replace target device to allocation state on close
blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
PCI/MSI: Skip masking MSI-X on Xen PV
powerpc/perf/hv-gpci: Fix counter value parsing
xen: fix setting of max_pfn in shared_info
include/linux/list.h: add a macro to test if entry is pointing to the head
9p/xen: Fix end of loop tests for list_for_each_entry
bpf/verifier: per-register parent pointers
bpf: correct slot_type marking logic to allow more stack slot sharing
bpf: Support variable offset stack access from helpers
bpf: Reject indirect var_off stack access in raw mode
bpf: Reject indirect var_off stack access in unpriv mode
bpf: Sanity check max value for var_off stack access
selftests/bpf: Test variable offset stack access
bpf: track spill/fill of constants
selftests/bpf: fix tests due to const spill/fill
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
bpf: Fix leakage due to insufficient speculative store bypass mitigation
bpf: verifier: Allocate idmap scratch in verifier env
bpf: Fix pointer arithmetic mask tightening under state pruning
tools/thermal/tmon: Add cross compiling support
soc: aspeed: lpc-ctrl: Fix boundary check for mmap
arm64: head: avoid over-mapping in map_memory
crypto: public_key: fix overflow during implicit conversion
block: bfq: fix bfq_set_next_ioprio_data()
power: supply: max17042: handle fails of reading status register
dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
VMCI: fix NULL pointer dereference when unmapping queue pair
media: uvc: don't do DMA on stack
media: rc-loopback: return number of emitters rather than error
libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
ARM: 9105/1: atags_to_fdt: don't warn about stack size
PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
PCI: xilinx-nwl: Enable the clock through CCF
PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response
PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
HID: input: do not report stylus battery state as "full"
RDMA/iwcm: Release resources if iw_cm module initialization fails
docs: Fix infiniband uverbs minor number
pinctrl: samsung: Fix pinctrl bank pin count
vfio: Use config not menuconfig for VFIO_NOIOMMU
powerpc/stacktrace: Include linux/delay.h
openrisc: don't printk() unconditionally
pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
scsi: qedi: Fix error codes in qedi_alloc_global_queues()
platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
fscache: Fix cookie key hashing
f2fs: fix to account missing .skipped_gc_rwsem
f2fs: fix to unmap pages from userspace process in punch_hole()
MIPS: Malta: fix alignment of the devicetree buffer
userfaultfd: prevent concurrent API initialization
media: dib8000: rewrite the init prbs logic
crypto: mxs-dcp - Use sg_mapping_iter to copy data
PCI: Use pci_update_current_state() in pci_enable_device_flags()
tipc: keep the skb in rcv queue until the whole data is read
iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
ARM: dts: qcom: apq8064: correct clock names
video: fbdev: kyro: fix a DoS bug by restricting user input
netlink: Deal with ESRCH error in nlmsg_notify()
Smack: Fix wrong semantics in smk_access_entry()
usb: host: fotg210: fix the endpoint's transactional opportunities calculation
usb: host: fotg210: fix the actual_length of an iso packet
usb: gadget: u_ether: fix a potential null pointer dereference
usb: gadget: composite: Allow bMaxPower=0 if self-powered
staging: board: Fix uninitialized spinlock when attaching genpd
tty: serial: jsm: hold port lock when reporting modem line changes
drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
bpf/tests: Fix copy-and-paste error in double word test
bpf/tests: Do not PASS tests without actually testing the result
video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
video: fbdev: kyro: Error out if 'pixclock' equals zero
video: fbdev: riva: Error out if 'pixclock' equals zero
ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
flow_dissector: Fix out-of-bounds warnings
s390/jump_label: print real address in a case of a jump label bug
serial: 8250: Define RX trigger levels for OxSemi 950 devices
xtensa: ISS: don't panic in rs_init
hvsi: don't panic on tty_register_driver failure
serial: 8250_pci: make setup_port() parameters explicitly unsigned
staging: ks7010: Fix the initialization of the 'sleep_status' structure
samples: bpf: Fix tracex7 error raised on the missing argument
ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
Bluetooth: skip invalid hci_sync_conn_complete_evt
bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
media: imx258: Rectify mismatch of VTS value
media: imx258: Limit the max analogue gain to 480
media: v4l2-dv-timings.c: fix wrong condition in two for-loops
media: TDA1997x: fix tda1997x_query_dv_timings() return value
media: tegra-cec: Handle errors of clk_prepare_enable()
ARM: dts: imx53-ppd: Fix ACHC entry
arm64: dts: qcom: sdm660: use reg value for memory node
net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
Bluetooth: schedule SCO timeouts with delayed_work
Bluetooth: avoid circular locks in sco_sock_connect
gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
ARM: tegra: tamonten: Fix UART pad setting
Bluetooth: Fix handling of LE Enhanced Connection Complete
serial: sh-sci: fix break handling for sysrq
tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
rpc: fix gss_svc_init cleanup on failure
staging: rts5208: Fix get_ms_information() heap buffer size
gfs2: Don't call dlm after protocol is unmounted
of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
mmc: sdhci-of-arasan: Check return value of non-void funtions
mmc: rtsx_pci: Fix long reads when clock is prescaled
selftests/bpf: Enlarge select() timeout for test_maps
mmc: core: Return correct emmc response in case of ioctl error
cifs: fix wrong release in sess_alloc_buffer() failed path
Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
usb: musb: musb_dsps: request_irq() after initializing musb
usbip: give back URBs for unsent unlink requests during cleanup
usbip:vhci_hcd USB port can get stuck in the disabled state
ASoC: rockchip: i2s: Fix regmap_ops hang
ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
parport: remove non-zero check on count
ath9k: fix OOB read ar9300_eeprom_restore_internal
ath9k: fix sleeping in atomic context
net: fix NULL pointer reference in cipso_v4_doi_free
net: w5100: check return value after calling platform_get_resource()
parisc: fix crash with signals and alloca
ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
scsi: BusLogic: Fix missing pr_cont() use
scsi: qla2xxx: Sync queue idx with queue_pair_map idx
cpufreq: powernv: Fix init_chip_info initialization in numa=off
mm/hugetlb: initialize hugetlb_usage in mm_init
memcg: enable accounting for pids in nested pid namespaces
platform/chrome: cros_ec_proto: Send command again when timeout occurs
drm/amdgpu: Fix BUG_ON assert
dm thin metadata: Fix use-after-free in dm_bm_set_read_only
xen: reset legacy rtc flag for PV domU
bnx2x: Fix enabling network interfaces without VFs
arm64/sve: Use correct size when reinitialising SVE state
PM: base: power: don't try to use non-existing RTC for storing data
PCI: Add AMD GPU multi-function power dependencies
x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
tipc: fix an use-after-free issue in tipc_recvmsg
net-caif: avoid user-triggerable WARN_ON(1)
ptp: dp83640: don't define PAGE0
dccp: don't duplicate ccid when cloning dccp sock
net/l2tp: Fix reference count leak in l2tp_udp_recv_core
r6040: Restore MDIO clock frequency after MAC reset
tipc: increase timeout in tipc_sk_enqueue()
perf machine: Initialize srcline string member in add_location struct
net/mlx5: Fix potential sleeping in atomic context
events: Reuse value read using READ_ONCE instead of re-reading it
net/af_unix: fix a data-race in unix_dgram_poll
net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
qed: Handle management FW error
ibmvnic: check failover_pending in login response
net: hns3: pad the short tunnel frame before sending to hardware
mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
KVM: s390: index kvm->arch.idle_mask by vcpu_idx
dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
mfd: Don't use irq_create_mapping() to resolve a mapping
PCI: Add ACS quirks for Cavium multi-function devices
net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
block, bfq: honor already-setup queue merges
ethtool: Fix an error code in cxgb2.c
NTB: perf: Fix an error code in perf_setup_inbuf()
mfd: axp20x: Update AXP288 volatile ranges
PCI: Fix pci_dev_str_match_path() alloc while atomic bug
KVM: arm64: Handle PSCI resets before userspace touches vCPU state
PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
ARC: export clear_user_page() for modules
net: dsa: b53: Fix calculating number of switch ports
netfilter: socket: icmp6: fix use-after-scope
fq_codel: reject silly quantum parameters
qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
ip_gre: validate csum_start only on pull
net: renesas: sh_eth: Fix freeing wrong tx descriptor
s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
Linux 4.19.207
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I18108cb47ba9e95838ebe55aaabe34de345ee846
commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream.
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing
skb head qlen.
Also, change unix_dgram_poll() to use lockless version
of unix_recvq_full()
It is verry possible we can switch all/most unix_recvq_full()
to the lockless version, this will be done in a future kernel version.
[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1
BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll
write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0:
__skb_insert include/linux/skbuff.h:1938 [inline]
__skb_queue_before include/linux/skbuff.h:2043 [inline]
__skb_queue_tail include/linux/skbuff.h:2076 [inline]
skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264
unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
___sys_sendmsg net/socket.c:2446 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2532
__do_sys_sendmmsg net/socket.c:2561 [inline]
__se_sys_sendmmsg net/socket.c:2558 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1:
skb_queue_len include/linux/skbuff.h:1869 [inline]
unix_recvq_full net/unix/af_unix.c:194 [inline]
unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777
sock_poll+0x23e/0x260 net/socket.c:1288
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll fs/eventpoll.c:846 [inline]
ep_send_events fs/eventpoll.c:1683 [inline]
ep_poll fs/eventpoll.c:1798 [inline]
do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226
__do_sys_epoll_wait fs/eventpoll.c:2238 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2233 [inline]
__x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000001b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()")
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.201
virtio_net: Do not pull payload in skb->head
gro: ensure frag0 meets IP header alignment
x86/asm: Ensure asm/proto.h can be included stand-alone
btrfs: fix rw device counting in __btrfs_free_extra_devids
x86/kvm: fix vcpu-id indexed array sizes
ocfs2: fix zero out valid data
ocfs2: issue zeroout to EOF blocks
can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: usb_8dev: fix memory leak
can: ems_usb: fix memory leak
can: esd_usb2: fix memory leak
NIU: fix incorrect error return, missed in previous revert
nfc: nfcsim: fix use after free during module unload
cfg80211: Fix possible memory leak in function cfg80211_bss_update
netfilter: conntrack: adjust stop timestamp to real expiry value
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
i40e: Fix logic of disabling queues
i40e: Fix log TC creation failure when max num of queues is exceeded
tipc: fix sleeping in tipc accept routine
mlx4: Fix missing error code in mlx4_load_one()
net: llc: fix skb_over_panic
net/mlx5: Fix flow table chaining
sctp: fix return value check in __sctp_rcv_asconf_lookup
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
sis900: Fix missing pci_disable_device() in probe and remove
can: hi311x: fix a signedness bug in hi3110_cmd()
powerpc/pseries: Fix regression while building external modules
Revert "perf map: Fix dso->nsinfo refcounting"
i40e: Add additional info to PHY type error
Linux 4.19.201
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I89f69eba9523274f96ea396a562f340b578d968c
commit 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 upstream.
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0ef ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5eee7bd7e245914e4e050c413dfe864e31805207 upstream.
This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dcfea72e79b0aa7a057c8f6024169d86a1bbc84b upstream.
As part of the continual effort to remove direct usage of skb->next and
skb->prev, this patch adds a helper for iterating through the
singly-linked variant of skb lists, which are used for lists of GSO
packet. The name "skb_list_..." has been chosen to match the existing
function, "kfree_skb_list, which also operates on these singly-linked
lists, and the "..._walk_safe" part is the same idiom as elsewhere in
the kernel.
This patch removes the helper from wireguard and puts it into
linux/skbuff.h, while making it a bit more robust for general usage. In
particular, parenthesis are added around the macro argument usage, and it
now accounts for trying to iterate through an already-null skb pointer,
which will simply run the iteration zero times. This latter enhancement
means it can be used to replace both do { ... } while and while (...)
open-coded idioms.
This should take care of these three possible usages, which match all
current methods of iterations.
skb_list_walk_safe(segs, skb, next) { ... }
skb_list_walk_safe(skb, skb, next) { ... }
skb_list_walk_safe(segs, skb, segs) { ... }
Gcc appears to generate efficient code for each of these.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Just the skbuff.h changes for backporting - gregkh]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This worked before, because we made all callers name their next pointer
"next". But in trying to be more "drop-in" ready, the silliness here is
revealed. This commit fixes the problem by making the macro argument and
the member use different names.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5eee7bd7e245914e4e050c413dfe864e31805207)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4778eb05d0ed284543822bd3ba537379c3c19b85
As part of the continual effort to remove direct usage of skb->next and
skb->prev, this patch adds a helper for iterating through the
singly-linked variant of skb lists, which are used for lists of GSO
packet. The name "skb_list_..." has been chosen to match the existing
function, "kfree_skb_list, which also operates on these singly-linked
lists, and the "..._walk_safe" part is the same idiom as elsewhere in
the kernel.
This patch removes the helper from wireguard and puts it into
linux/skbuff.h, while making it a bit more robust for general usage. In
particular, parenthesis are added around the macro argument usage, and it
now accounts for trying to iterate through an already-null skb pointer,
which will simply run the iteration zero times. This latter enhancement
means it can be used to replace both do { ... } while and while (...)
open-coded idioms.
This should take care of these three possible usages, which match all
current methods of iterations.
skb_list_walk_safe(segs, skb, next) { ... }
skb_list_walk_safe(skb, skb, next) { ... }
skb_list_walk_safe(segs, skb, segs) { ... }
Gcc appears to generate efficient code for each of these.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit dcfea72e79b0aa7a057c8f6024169d86a1bbc84b)
Bug: 152722841
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia8fe1237424161e4de23df4515dbb2275e15e180
Changes in 4.19.149
selinux: allow labeling before policy is loaded
media: mc-device.c: fix memleak in media_device_register_entity
dma-fence: Serialise signal enabling (dma_fence_enable_sw_signaling)
ath10k: fix array out-of-bounds access
ath10k: fix memory leak for tpc_stats_final
mm: fix double page fault on arm64 if PTE_AF is cleared
scsi: aacraid: fix illegal IO beyond last LBA
m68k: q40: Fix info-leak in rtc_ioctl
gma/gma500: fix a memory disclosure bug due to uninitialized bytes
ASoC: kirkwood: fix IRQ error handling
media: smiapp: Fix error handling at NVM reading
arch/x86/lib/usercopy_64.c: fix __copy_user_flushcache() cache writeback
x86/ioapic: Unbreak check_timer()
ALSA: usb-audio: Add delay quirk for H570e USB headsets
ALSA: hda/realtek - Couldn't detect Mic if booting with headset plugged
ALSA: hda/realtek: Enable front panel headset LED on Lenovo ThinkStation P520
lib/string.c: implement stpcpy
leds: mlxreg: Fix possible buffer overflow
PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out
scsi: fnic: fix use after free
scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounce
net: silence data-races on sk_backlog.tail
clk/ti/adpll: allocate room for terminating null
drm/amdgpu/powerplay: fix AVFS handling with custom powerplay table
mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup()
mfd: mfd-core: Protect against NULL call-back function pointer
drm/amdgpu/powerplay/smu7: fix AVFS handling with custom powerplay table
tpm_crb: fix fTPM on AMD Zen+ CPUs
tracing: Adding NULL checks for trace_array descriptor pointer
bcache: fix a lost wake-up problem caused by mca_cannibalize_lock
dmaengine: mediatek: hsdma_probe: fixed a memory leak when devm_request_irq fails
RDMA/qedr: Fix potential use after free
RDMA/i40iw: Fix potential use after free
fix dget_parent() fastpath race
xfs: fix attr leaf header freemap.size underflow
RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'
ubi: Fix producing anchor PEBs
mmc: core: Fix size overflow for mmc partitions
gfs2: clean up iopen glock mess in gfs2_create_inode
scsi: pm80xx: Cleanup command when a reset times out
debugfs: Fix !DEBUG_FS debugfs_create_automount
CIFS: Properly process SMB3 lease breaks
ASoC: max98090: remove msleep in PLL unlocked workaround
kernel/sys.c: avoid copying possible padding bytes in copy_to_user
KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()
xfs: fix log reservation overflows when allocating large rt extents
neigh_stat_seq_next() should increase position index
rt_cpu_seq_next should increase position index
ipv6_route_seq_next should increase position index
seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier
media: ti-vpe: cal: Restrict DMA to avoid memory corruption
sctp: move trace_sctp_probe_path into sctp_outq_sack
ACPI: EC: Reference count query handlers under lock
scsi: ufs: Make ufshcd_add_command_trace() easier to read
scsi: ufs: Fix a race condition in the tracing code
dmaengine: zynqmp_dma: fix burst length configuration
s390/cpum_sf: Use kzalloc and minor changes
powerpc/eeh: Only dump stack once if an MMIO loop is detected
Bluetooth: btrtl: Use kvmalloc for FW allocations
tracing: Set kernel_stack's caller size properly
ARM: 8948/1: Prevent OOB access in stacktrace
ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter
ceph: ensure we have a new cap before continuing in fill_inode
selftests/ftrace: fix glob selftest
tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
Bluetooth: Fix refcount use-after-free issue
mm/swapfile.c: swap_next should increase position index
mm: pagewalk: fix termination condition in walk_pte_range()
Bluetooth: prefetch channel before killing sock
KVM: fix overflow of zero page refcount with ksm running
ALSA: hda: Clear RIRB status before reading WP
skbuff: fix a data race in skb_queue_len()
audit: CONFIG_CHANGE don't log internal bookkeeping as an event
selinux: sel_avc_get_stat_idx should increase position index
scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
scsi: lpfc: Fix coverity errors in fmdi attribute handling
drm/omap: fix possible object reference leak
clk: stratix10: use do_div() for 64-bit calculation
crypto: chelsio - This fixes the kernel panic which occurs during a libkcapi test
mt76: clear skb pointers from rx aggregation reorder buffer during cleanup
ALSA: usb-audio: Don't create a mixer element with bogus volume range
perf test: Fix test trace+probe_vfs_getname.sh on s390
RDMA/rxe: Fix configuration of atomic queue pair attributes
KVM: x86: fix incorrect comparison in trace event
dmaengine: stm32-mdma: use vchan_terminate_vdesc() in .terminate_all
media: staging/imx: Missing assignment in imx_media_capture_device_register()
x86/pkeys: Add check for pkey "overflow"
bpf: Remove recursion prevention from rcu free callback
dmaengine: stm32-dma: use vchan_terminate_vdesc() in .terminate_all
dmaengine: tegra-apb: Prevent race conditions on channel's freeing
drm/amd/display: dal_ddc_i2c_payloads_create can fail causing panic
firmware: arm_sdei: Use cpus_read_lock() to avoid races with cpuhp
random: fix data races at timer_rand_state
bus: hisi_lpc: Fixup IO ports addresses to avoid use-after-free in host removal
media: go7007: Fix URB type for interrupt handling
Bluetooth: guard against controllers sending zero'd events
timekeeping: Prevent 32bit truncation in scale64_check_overflow()
ext4: fix a data race at inode->i_disksize
perf jevents: Fix leak of mapfile memory
mm: avoid data corruption on CoW fault into PFN-mapped VMA
drm/amdgpu: increase atombios cmd timeout
drm/amd/display: Stop if retimer is not available
ath10k: use kzalloc to read for ath10k_sdio_hif_diag_read
scsi: aacraid: Disabling TM path and only processing IOP reset
Bluetooth: L2CAP: handle l2cap config request during open state
media: tda10071: fix unsigned sign extension overflow
xfs: don't ever return a stale pointer from __xfs_dir3_free_read
xfs: mark dir corrupt when lookup-by-hash fails
ext4: mark block bitmap corrupted when found instead of BUGON
tpm: ibmvtpm: Wait for buffer to be set before proceeding
rtc: sa1100: fix possible race condition
rtc: ds1374: fix possible race condition
nfsd: Don't add locks to closed or closing open stateids
RDMA/cm: Remove a race freeing timewait_info
KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones
drm/msm: fix leaks if initialization fails
drm/msm/a5xx: Always set an OPP supported hardware value
tracing: Use address-of operator on section symbols
thermal: rcar_thermal: Handle probe error gracefully
perf parse-events: Fix 3 use after frees found with clang ASAN
serial: 8250_port: Don't service RX FIFO if throttled
serial: 8250_omap: Fix sleeping function called from invalid context during probe
serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout
perf cpumap: Fix snprintf overflow check
cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn
tools: gpio-hammer: Avoid potential overflow in main
nvme-multipath: do not reset on unknown status
nvme: Fix controller creation races with teardown flow
RDMA/rxe: Set sys_image_guid to be aligned with HW IB devices
scsi: hpsa: correct race condition in offload enabled
SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
svcrdma: Fix leak of transport addresses
PCI: Use ioremap(), not phys_to_virt() for platform ROM
ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len
ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor
PCI: pciehp: Fix MSI interrupt race
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
mm/kmemleak.c: use address-of operator on section symbols
mm/filemap.c: clear page error before actual read
mm/vmscan.c: fix data races using kswapd_classzone_idx
nvmet-rdma: fix double free of rdma queue
mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area
scsi: qedi: Fix termination timeouts in session logout
serial: uartps: Wait for tx_empty in console setup
KVM: Remove CREATE_IRQCHIP/SET_PIT2 race
bdev: Reduce time holding bd_mutex in sync in blkdev_close()
drivers: char: tlclk.c: Avoid data race between init and interrupt handler
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
net: openvswitch: use u64 for meter bucket
scsi: aacraid: Fix error handling paths in aac_probe_one()
staging:r8188eu: avoid skb_clone for amsdu to msdu conversion
sparc64: vcc: Fix error return code in vcc_probe()
arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion
atm: fix a memory leak of vcc->user_back
perf mem2node: Avoid double free related to realloc
power: supply: max17040: Correct voltage reading
phy: samsung: s5pv210-usb2: Add delay after reset
Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe()
tipc: fix memory leak in service subscripting
tty: serial: samsung: Correct clock selection logic
ALSA: hda: Fix potential race in unsol event handler
powerpc/traps: Make unrecoverable NMIs die instead of panic
fuse: don't check refcount after stealing page
USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int
scsi: cxlflash: Fix error return code in cxlflash_probe()
arm64/cpufeature: Drop TraceFilt feature exposure from ID_DFR0 register
e1000: Do not perform reset in reset_task if we are already down
drm/nouveau/debugfs: fix runtime pm imbalance on error
drm/nouveau: fix runtime pm imbalance on error
drm/nouveau/dispnv50: fix runtime pm imbalance on error
printk: handle blank console arguments passed in.
usb: dwc3: Increase timeout for CmdAct cleared by device controller
btrfs: don't force read-only after error in drop snapshot
vfio/pci: fix memory leaks of eventfd ctx
perf evsel: Fix 2 memory leaks
perf trace: Fix the selection for architectures to generate the errno name tables
perf stat: Fix duration_time value for higher intervals
perf util: Fix memory leak of prefix_if_not_in
perf metricgroup: Free metric_events on error
perf kcore_copy: Fix module map when there are no modules loaded
ASoC: img-i2s-out: Fix runtime PM imbalance on error
wlcore: fix runtime pm imbalance in wl1271_tx_work
wlcore: fix runtime pm imbalance in wlcore_regdomain_config
mtd: rawnand: omap_elm: Fix runtime PM imbalance on error
PCI: tegra: Fix runtime PM imbalance on error
ceph: fix potential race in ceph_check_caps
mm/swap_state: fix a data race in swapin_nr_pages
rapidio: avoid data race between file operation callbacks and mport_cdev_add().
mtd: parser: cmdline: Support MTD names containing one or more colons
x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline
vfio/pci: Clear error and request eventfd ctx after releasing
cifs: Fix double add page to memcg when cifs_readpages
nvme: fix possible deadlock when I/O is blocked
scsi: libfc: Handling of extra kref
scsi: libfc: Skip additional kref updating work event
selftests/x86/syscall_nt: Clear weird flags after each test
vfio/pci: fix racy on error and request eventfd ctx
btrfs: qgroup: fix data leak caused by race between writeback and truncate
ubi: fastmap: Free unused fastmap anchor peb during detach
perf parse-events: Use strcmp() to compare the PMU name
net: openvswitch: use div_u64() for 64-by-32 divisions
nvme: explicitly update mpath disk capacity on revalidation
ASoC: wm8994: Skip setting of the WM8994_MICBIAS register for WM1811
ASoC: wm8994: Ensure the device is resumed in wm89xx_mic_detect functions
ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN Converter9 2-in-1
RISC-V: Take text_mutex in ftrace_init_nop()
s390/init: add missing __init annotations
lockdep: fix order in trace_hardirqs_off_caller()
drm/amdkfd: fix a memory leak issue
i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()
objtool: Fix noreturn detection for ignored functions
ieee802154: fix one possible memleak in ca8210_dev_com_init
ieee802154/adf7242: check status of adf7242_read_reg
clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
mwifiex: Increase AES key storage size to 256 bits
batman-adv: bla: fix type misuse for backbone_gw hash indexing
atm: eni: fix the missed pci_disable_device() for eni_init_one()
batman-adv: mcast/TT: fix wrongly dropped or rerouted packets
mac802154: tx: fix use-after-free
bpf: Fix clobbering of r2 in bpf_gen_ld_abs
drm/vc4/vc4_hdmi: fill ASoC card owner
net: qed: RDMA personality shouldn't fail VF load
drm/sun4i: sun8i-csc: Secondary CSC register correction
batman-adv: Add missing include for in_interrupt()
batman-adv: mcast: fix duplicate mcast packets in BLA backbone from mesh
batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh
bpf: Fix a rcu warning for bpffs map pretty-print
ALSA: asihpi: fix iounmap in error handler
regmap: fix page selection for noinc reads
MIPS: Add the missing 'CPU_1074K' into __get_cpu_type()
KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
KVM: SVM: Add a dedicated INVD intercept routine
tracing: fix double free
s390/dasd: Fix zero write for FBA devices
kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
mm, THP, swap: fix allocating cluster for swapfile by mistake
s390/zcrypt: Fix ZCRYPT_PERDEV_REQCNT ioctl
kprobes: Fix compiler warning for !CONFIG_KPROBES_ON_FTRACE
ata: define AC_ERR_OK
ata: make qc_prep return ata_completion_errors
ata: sata_mv, avoid trigerrable BUG_ON
KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch
Linux 4.19.149
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idfc1b35ec63b4b464aeb6e32709102bee0efc872
[ Upstream commit 86b18aaa2b5b5bb48e609cd591b3d2d0fdbe0442 ]
sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
net/unix/af_unix.c:1761
____sys_sendmsg+0x33e/0x370
___sys_sendmsg+0xa6/0xf0
__sys_sendmsg+0x69/0xf0
__x64_sys_sendmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
__skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
__skb_try_recv_datagram+0xbe/0x220
unix_dgram_recvmsg+0xee/0x850
____sys_recvmsg+0x1fb/0x210
___sys_recvmsg+0xa2/0xf0
__sys_recvmsg+0x66/0xf0
__x64_sys_recvmsg+0x51/0x70
do_syscall_64+0x91/0xb47
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Since only the read is operating as lockless, it could introduce a logic
bug in unix_recvq_full() due to the load tearing. Fix it by adding
a lockless variant of skb_queue_len() and unix_recvq_full() where
READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
the commit d7d16a89350a ("net: add skb_queue_empty_lockless()").
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Changes in 4.19.148
af_key: pfkey_dump needs parameter validation
KVM: fix memory leak in kvm_io_bus_unregister_dev()
kprobes: fix kill kprobe which has been marked as gone
mm/thp: fix __split_huge_pmd_locked() for migration PMD
cxgb4: Fix offset when clearing filter byte counters
geneve: add transport ports in route lookup for geneve
hdlc_ppp: add range checks in ppp_cp_parse_cr()
ip: fix tos reflection in ack and reset packets
ipv6: avoid lockdep issue in fib6_del()
net: DCB: Validate DCB_ATTR_DCB_BUFFER argument
net: dsa: rtl8366: Properly clear member config
net: ipv6: fix kconfig dependency warning for IPV6_SEG6_HMAC
net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
nfp: use correct define to return NONE fec
tipc: Fix memory leak in tipc_group_create_member()
tipc: fix shutdown() of connection oriented socket
tipc: use skb_unshare() instead in tipc_buf_append()
bnxt_en: return proper error codes in bnxt_show_temp
bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex.
net: phy: Avoid NPD upon phy_detach() when driver is unbound
net: qrtr: check skb_put_padto() return value
net: add __must_check to skb_put_padto()
ipv4: Update exception handling for multipath routes via same device
MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info
kbuild: add OBJSIZE variable for the size tool
Documentation/llvm: add documentation on building w/ Clang/LLVM
Documentation/llvm: fix the name of llvm-size
net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
x86/boot: kbuild: allow readelf executable to be specified
kbuild: remove AS variable
kbuild: replace AS=clang with LLVM_IAS=1
kbuild: support LLVM=1 to switch the default tools to Clang/LLVM
mm: memcg: fix memcg reclaim soft lockup
tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning
tcp_bbr: adapt cwnd based on ack aggregation estimation
serial: 8250: Avoid error message on reprobe
Linux 4.19.148
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I5077093f58ae882547e3b23c5bbd4dfe78116071
[ Upstream commit 4a009cb04aeca0de60b73f37b102573354214b52 ]
skb_put_padto() and __skb_put_padto() callers
must check return values or risk use-after-free.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Try to mitigate potential future driver core api changes by adding a
padding to a lot of different networking structures:
struct ipv6_devconf
struct proto_ops
struct header_ops
struct napi_struct
struct netdev_queue
struct netdev_rx_queue
struct xfrmdev_ops
struct net_device_ops
struct net_device
struct packet_type
struct sk_buff
struct tlsdev_ops
Based on a change made to the RHEL/CENTOS 8 kernel.
Bug: 151154716
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I590f004754dbc8beafa40e71cac70a0938c38b4a
Remove condition for including struct sk_buff members based on
CONFIG_BRIDGE_NETFILTER config.
Bug: 151840548
Test: build
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iee626843e107e8d64c3c6c4a1cc9c08f4e38f5af
Merged-In: Iee626843e107e8d64c3c6c4a1cc9c08f4e38f5af
[ Upstream commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 ]
UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.
Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.
Really, using jhash() to generate cookies sent on the wire
is a serious security concern.
Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.
Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")
Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.
Packet schedulers using skb_get_hash_perturb() benefit from this change.
Fixes: b56774163f ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Fixes: cb1ce2ef38 ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Berger <jonathann1@walla.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d7d16a89350ab263484c0aa2b523dd3a234e4a80 ]
Some paths call skb_queue_empty() without holding
the queue lock. We must use a barrier in order
to not let the compiler do strange things, and avoid
KCSAN splats.
Adding a barrier in skb_queue_empty() might be overkill,
I prefer adding a new helper to clearly identify
points where the callers might be lockless. This might
help us finding real bugs.
The corresponding WRITE_ONCE() should add zero cost
for current compilers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 185ce5c38ea76f29b6bd9c7c8c7a5e5408834920 ]
Zerocopy skbs without completion notification were added for packet
sockets with PACKET_TX_RING user buffers. Those signal completion
through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
added only to avoid premature notification after clone or orphan, by
triggering a copy on these paths for these packets.
The mechanism had to define a special "no-uarg" mode because packet
sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
for a different pointer.
Before deferencing skb_uarg(skb), verify that it is a real pointer.
Fixes: 5cd8d46ea1562 ("packet: copy user buffers before orphan or clone")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c3024debf62de4c6ac6d3cb4c0063be21d4f652 ]
BPF can adjust gso only for tcp bytestreams. Fail on other gso types.
But only on gso packets. It does not touch this field if !gso_size.
Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b90efd2258749e04e1b3f71ef0d716f2ac2337e0 ]
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.
The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.
Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.
Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d5be7f632bad0f489879eed0ff4b99bd7fe0b74c upstream.
Syzkaller again found a path to a kernel crash through bad gso input.
By building an excessively large packet to cause an skb field to wrap.
If VIRTIO_NET_HDR_F_NEEDS_CSUM was set this would have been dropped in
skb_partial_csum_set.
GSO packets that do not set checksum offload are suspicious and rare.
Most callers of virtio_net_hdr_to_skb already pass them to
skb_probe_transport_header.
Move that test forward, change it to detect parse failure and drop
packets on failure as those cleary are not one of the legitimate
VIRTIO_NET_HDR_GSO types.
Fixes: bfd5f4a3d6 ("packet: Add GSO/csum offload support.")
Fixes: f43798c276 ("tun: Allow GSO using virtio_net_hdr")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6c57f0458022298e4da1729c67bd33ce41c14e7a ]
In certain cases, pskb_trim_rcsum() may change skb pointers.
Reinitialize header pointers afterwards to avoid potential
use-after-frees. Add a note in the documentation of
pskb_trim_rcsum(). Found by KASAN.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5cd8d46ea1562be80063f53c7c6a5f40224de623 ]
tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.
This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).
Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.
Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
The fix relies on features introduced in commit 52267790ef ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests
Fixes: 69e3c75f4d ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-08-13
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add driver XDP support for veth. This can be used in conjunction with
redirect of another XDP program e.g. sitting on NIC so the xdp_frame
can be forwarded to the peer veth directly without modification,
from Toshiaki.
2) Add a new BPF map type REUSEPORT_SOCKARRAY and prog type SK_REUSEPORT
in order to provide more control and visibility on where a SO_REUSEPORT
sk should be located, and the latter enables to directly select a sk
from the bpf map. This also enables map-in-map for application migration
use cases, from Martin.
3) Add a new BPF helper bpf_skb_ancestor_cgroup_id() that returns the id
of cgroup v2 that is the ancestor of the cgroup associated with the
skb at the ancestor_level, from Andrey.
4) Implement BPF fs map pretty-print support based on BTF data for regular
hash table and LRU map, from Yonghong.
5) Decouple the ability to attach BTF for a map from the key and value
pretty-printer in BPF fs, and enable further support of BTF for maps for
percpu and LPM trie, from Daniel.
6) Implement a better BPF sample of using XDP's CPU redirect feature for
load balancing SKB processing to remote CPU. The sample implements the
same XDP load balancing as Suricata does which is symmetric hash based
on IP and L4 protocol, from Jesper.
7) Revert adding NULL pointer check with WARN_ON_ONCE() in __xdp_return()'s
critical path as it is ensured that the allocator is present, from Björn.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is needed for veth XDP which does skb_copy_expand()-like operation.
v2:
- Drop skb_copy_header part because it has already been exported now.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes the following sparse warning:
./include/linux/skbuff.h:2365:58: warning: Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.
Tested:
- a follow-up patch contains a rather comprehensive ip defrag
self-test (functional)
- ran neper `udp_stream -c -H <host> -F 100 -l 300 -T 20`:
netstat --statistics
Ip:
282078937 total packets received
0 forwarded
0 incoming packets discarded
946760 incoming packets delivered
18743456 requests sent out
101 fragments dropped after timeout
282077129 reassemblies required
944952 packets reassembled ok
262734239 packet reassembles failed
(The numbers/stats above are somewhat better re:
reassemblies vs a kernel without this patchset. More
comprehensive performance testing TBD).
Reported-by: Jann Horn <jannh@google.com>
Reported-by: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 784abe24c9 ("net: Add decrypted field to skb")
introduced a 'decrypted' field that is explicitly copied on skb
copy and clone.
Move it between headers_start[0] and headers_end[0], so that we
don't need to copy it explicitly as it's copied by the memcpy()
in __copy_skb_header().
While at it, drop the assignment in __skb_clone(), it was
already redundant.
This doesn't change the size of sk_buff or cacheline boundaries.
The 15-bits hole before tc_index becomes a 14-bits hole, and
will be again a 15-bits hole when this change is merged with
commit 8b7008620b ("net: Don't copy pfmemalloc flag in
__copy_skb_header()").
v2: as reported by kbuild test robot (oops, I forgot to build
with CONFIG_TLS_DEVICE it seems), we can't use
CHECK_SKB_FIELD() on a bit-field member. Just drop the
check for the moment being, perhaps we could think of some
magic to also check bit-field members one day.
Fixes: 784abe24c9 ("net: Add decrypted field to skb")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The decrypted bit is propogated to cloned/copied skbs.
This will be used later by the inline crypto receive side offload
of tls.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pfmemalloc flag indicates that the skb was allocated from
the PFMEMALLOC reserves, and the flag is currently copied on skb
copy and clone.
However, an skb copied from an skb flagged with pfmemalloc
wasn't necessarily allocated from PFMEMALLOC reserves, and on
the other hand an skb allocated that way might be copied from an
skb that wasn't.
So we should not copy the flag on skb copy, and rather decide
whether to allow an skb to be associated with sockets unrelated
to page reclaim depending only on how it was allocated.
Move the pfmemalloc flag before headers_start[0] using an
existing 1-bit hole, so that __copy_skb_header() doesn't copy
it.
When cloning, we'll now take care of this flag explicitly,
contravening to the warning comment of __skb_clone().
While at it, restore the newline usage introduced by commit
b193722731 ("net: reorganize sk_buff for faster
__copy_skb_header()") to visually separate bytes used in
bitfields after headers_start[0], that was gone after commit
a9e419dc7b ("netfilter: merge ctinfo into nfct pointer storage
area"), and describe the pfmemalloc flag in the kernel-doc
structure comment.
This doesn't change the size of sk_buff or cacheline boundaries,
but consolidates the 15 bits hole before tc_index into a 2 bytes
hole before csum, that could now be filled more easily.
Reported-by: Patrick Talbert <ptalbert@redhat.com>
Fixes: c93bdd0e03 ("netvm: allow skb allocation to use PFMEMALLOC reserves")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple overlapping changes in stmmac driver.
Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
The poll() changes were not well thought out, and completely
unexplained. They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.
Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead. That gets rid of one of the new indirections.
But that doesn't fix the new complexity that is completely unwarranted
for the regular case. The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.
[ This revert is a revert of about 30 different commits, not reverted
individually because that would just be unnecessarily messy - Linus ]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manage pending per-NAPI GRO packets via list_head.
Return an SKB pointer from the GRO receive handlers. When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.
Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.
Signed-off-by: David S. Miller <davem@davemloft.net>