5678 Commits

Author SHA1 Message Date
Damien Le Moal
f243b64a11 block: Introduce BLKGETNRZONES ioctl
Get a zoned block device total number of zones. The device can be a
partition of the whole device. The number of zones is always 0 for
regular block devices.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-08 00:52:08 +00:00
Damien Le Moal
6645652532 block: Introduce BLKGETZONESZ ioctl
Get a zoned block device zone size in number of 512 B sectors.
The zone size is always 0 for regular block devices.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-08 00:52:08 +00:00
claxten10
94ad0ad661 thermal: Import Xiaomi changes
Signed-off-by: claxten10 <claxten10@gmail.com>
2025-10-18 10:51:02 +00:00
claxten10
1b8e8fa7d0 sound: Import Xiaomi changes
Signed-off-by: claxten10 <claxten10@gmail.com>
2025-10-18 10:51:01 +00:00
Paul Chaignon
30f286f182 UPSTREAM: bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE
commit ead7f9b8de65632ef8060b84b0c55049a33cfea1 upstream.

In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other
things, update the L4 checksum after reverse SNATing IPv6 packets. That
use case is however not currently supported and leads to invalid
skb->csum values in some cases. This patch adds support for IPv6 address
changes in bpf_l4_csum_update via a new flag.

When calling bpf_l4_csum_replace in Cilium, it ends up calling
inet_proto_csum_replace_by_diff:

    1:  void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
    2:                                       __wsum diff, bool pseudohdr)
    3:  {
    4:      if (skb->ip_summed != CHECKSUM_PARTIAL) {
    5:          csum_replace_by_diff(sum, diff);
    6:          if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
    7:              skb->csum = ~csum_sub(diff, skb->csum);
    8:      } else if (pseudohdr) {
    9:          *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
    10:     }
    11: }

The bug happens when we're in the CHECKSUM_COMPLETE state. We've just
updated one of the IPv6 addresses. The helper now updates the L4 header
checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't.

For an IPv6 packet, the updates of the IPv6 address and of the L4
checksum will cancel each other. The checksums are set such that
computing a checksum over the packet including its checksum will result
in a sum of 0. So the same is true here when we update the L4 checksum
on line 5. We'll update it as to cancel the previous IPv6 address
update. Hence skb->csum should remain untouched in this case.

The same bug doesn't affect IPv4 packets because, in that case, three
fields are updated: the IPv4 address, the IP checksum, and the L4
checksum. The change to the IPv4 address and one of the checksums still
cancel each other in skb->csum, but we're left with one checksum update
and should therefore update skb->csum accordingly. That's exactly what
inet_proto_csum_replace_by_diff does.

This special case for IPv6 L4 checksums is also described atop
inet_proto_csum_replace16, the function we should be using in this case.

This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6,
to indicate that we're updating the L4 checksum of an IPv6 packet. When
the flag is set, inet_proto_csum_replace_by_diff will skip the
skb->csum update.

Fixes: 7d672345ed ("bpf: add generic bpf_csum_diff helper")
Change-Id: Ia07e6770587fff91588ba133a9efadab92372ed9
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Note: Fixed conflict due to unrelated comment change. ]
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-12 15:02:28 +01:00
Thomas Turner
0ea22e7902 fixup! BACKPORT: bpf: Switch most helper return values from 32-bit int to 64-bit long
Correct formatting of the comments.

Change-Id: If8040d4df7f295c476431102df1e8017b82e0b65
2025-10-12 15:02:28 +01:00
Lorenz Bauer
bc96dd8739 UPSTREAM: bpf: Add PROG_TEST_RUN support for sk_lookup programs
commit 7c32e8f8bc33a5f4b113a630857e46634e3e143b upstream.

Allow to pass sk_lookup programs to PROG_TEST_RUN. User space
provides the full bpf_sk_lookup struct as context. Since the
context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.

We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.

Change-Id: I7af748e3f11804e4e1ad0c532685f0c3dfaf4816
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-12 15:02:28 +01:00
Trond Myklebust
8c03862ac2 UPSTREAM: NFS: Move internal constants out of uapi/linux/nfs_mount.h
When the label says "for internal use only", then it doesn't belong
in the 'uapi' subtree.

Change-Id: Ia10de797ba5e5977870ebadb67a870405934e76e
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2025-10-12 13:52:32 +01:00
bengris32
c6aa1292ca Merge branch 'linux-4.19.y-cip' of https://git.kernel.org/pub/scm/linux/kernel/git/cip/linux-cip into android-4.19.y-mediatek
* 'linux-4.19.y-cip' of https://git.kernel.org/pub/scm/linux/kernel/git/cip/linux-cip:
  CIP: Bump version suffix to -cip124 after merge from cip/linux-4.19.y-st tree
  Update localversion-st, tree is up-to-date with 5.4.298.
  f2fs: fix to do sanity check on ino and xnid
  squashfs: fix memory leak in squashfs_fill_super
  pNFS: Handle RPC size limit for layoutcommits
  wifi: iwlwifi: fw: Fix possible memory leak in iwl_fw_dbg_collect
  usb: core: usb_submit_urb: downgrade type check
  udf: Verify partition map count
  f2fs: fix to avoid panic in f2fs_evict_inode
  usb: hub: Fix flushing and scheduling of delayed work that tunes runtime pm
  Revert "drm/dp: Change AUX DPCD probe address from DPCD_REV to LANE0_1_STATUS"
  net: usb: qmi_wwan: add Telit Cinterion LE910C4-WWX new compositions
  HID: hid-ntrig: fix unable to handle page fault in ntrig_report_version()
  HID: asus: fix UAF via HID_CLAIMED_INPUT validation
  efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare
  sctp: initialize more fields in sctp_v6_from_sk()
  net: stmmac: xgmac: Do not enable RX FIFO Overflow interrupts
  net/mlx5e: Set local Xoff after FW update
  net: dlink: fix multicast stats being counted incorrectly
  atm: atmtcp: Prevent arbitrary write in atmtcp_recv_control().
  net/atm: remove the atmdev_ops {get, set}sockopt methods
  Bluetooth: hci_event: Detect if HCI_EV_NUM_COMP_PKTS is unbalanced
  powerpc/kvm: Fix ifdef to remove build warning
  net: ipv4: fix regression in local-broadcast routes
  vhost/net: Protect ubufs with rcu read lock in vhost_net_ubuf_put()
  scsi: core: sysfs: Correct sysfs attributes access rights
  ftrace: Fix potential warning in trace_printk_seq during ftrace_dump
  alloc_fdtable(): change calling conventions.
  ALSA: usb-audio: Use correct sub-type for UAC3 feature unit validation
  net/sched: Make cake_enqueue return NET_XMIT_CN when past buffer_limit
  ipv6: sr: validate HMAC algorithm ID in seg6_hmac_info_add
  ALSA: usb-audio: Fix size validation in convert_chmap_v3()
  scsi: qla4xxx: Prevent a potential error pointer dereference
  usb: xhci: Fix slot_id resource race conflict
  nfs: fix UAF in direct writes
  NFS: Fix up commit deadlocks
  Bluetooth: fix use-after-free in device_for_each_child()
  selftests: forwarding: tc_actions.sh: add matchall mirror test
  codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()
  sch_qfq: make qfq_qlen_notify() idempotent
  sch_hfsc: make hfsc_qlen_notify() idempotent
  sch_drr: make drr_qlen_notify() idempotent
  btrfs: populate otime when logging an inode item
  media: venus: hfi: explicitly release IRQ during teardown
  f2fs: fix to avoid out-of-boundary access in dnode page
  media: venus: protect against spurious interrupts during probe
  media: venus: vdec: Clamp param smaller than 1fps and bigger than 240.
  drm/dp: Change AUX DPCD probe address from DPCD_REV to LANE0_1_STATUS
  media: rainshadow-cec: fix TOCTOU race condition in rain_interrupt()
  media: v4l2-ctrls: Don't reset handler's error in v4l2_ctrl_handler_free()
  ata: Fix SATA_MOBILE_LPM_POLICY description in Kconfig
  usb: musb: omap2430: fix device leak at unbind
  NFS: Fix the setting of capabilities when automounting a new filesystem
  NFS: Fix up handling of outstanding layoutcommit in nfs_update_inode()
  NFSv4: Fix nfs4_bitmap_copy_adjust()
  usb: typec: fusb302: cache PD RX state
  cdc-acm: fix race between initial clearing halt and open
  USB: cdc-acm: do not log successful probe on later errors
  nfsd: handle get_client_locked() failure in nfsd4_setclientid_confirm()
  tracing: Add down_write(trace_event_sem) when adding trace event
  usb: hub: Don't try to recover devices lost during warm reset.
  usb: hub: avoid warm port reset during USB3 disconnect
  x86/mce/amd: Add default names for MCA banks and blocks
  iio: hid-sensor-prox: Fix incorrect OFFSET calculation
  mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n
  mm/zsmalloc.c: convert to use kmem_cache_zalloc in cache_alloc_zspage()
  net: usbnet: Fix the wrong netif_carrier_on() call
  net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
  PCI/ACPI: Fix runtime PM ref imbalance on Hot-Plug Capable ports
  ACPI: processor: idle: Check acpi_fetch_acpi_dev() return value
  kbuild: Add KBUILD_CPPFLAGS to as-option invocation
  kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS
  kbuild: Add CLANG_FLAGS to as-instr
  mips: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation
  kbuild: Update assembler calls to use proper flags and language target
  ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
  usb: dwc3: Ignore late xferNotReady event to prevent halt timeout
  USB: storage: Ignore driver CD mode for Realtek multi-mode Wi-Fi dongles
  usb: storage: realtek_cr: Use correct byte order for bcs->Residue
  USB: storage: Add unusual-devs entry for Novatek NTK96550-based camera
  usb: quirks: Add DELAY_INIT quick for another SanDisk 3.2Gen1 Flash Drive
  iio: proximity: isl29501: fix buffered read on big-endian systems
  ftrace: Also allocate and copy hash for reading of filter files
  fpga: zynq_fpga: Fix the wrong usage of dma_map_sgtable()
  fs/buffer: fix use-after-free when call bh_read() helper
  drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3
  media: venus: Add a check for packet size after reading from shared memory
  media: ov2659: Fix memory leaks in ov2659_probe()
  media: usbtv: Lock resolution while streaming
  media: gspca: Add bounds checking to firmware parser
  jbd2: prevent softlockup in jbd2_log_do_checkpoint()
  PCI: endpoint: Fix configfs group removal on driver teardown
  PCI: endpoint: Fix configfs group list head handling
  mtd: rawnand: fsmc: Add missing check after DMA map
  wifi: brcmsmac: Remove const from tbl_ptr parameter in wlc_lcnphy_common_read_table()
  zynq_fpga: use sgtable-based scatterlist wrappers
  ata: libata-scsi: Fix ata_to_sense_error() status handling
  ext4: fix reserved gdt blocks handling in fsmap
  ext4: fix fsmap end of range reporting with bigalloc
  ext4: check fast symlink for ea_inode correctly
  Revert "vgacon: Add check for vc_origin address range in vgacon_scroll()"
  vt: defkeymap: Map keycodes above 127 to K_HOLE
  usb: gadget: udc: renesas_usb3: fix device leak at unbind
  usb: atm: cxacru: Merge cxacru_upload_firmware() into cxacru_heavy_init()
  m68k: Fix lost column on framebuffer debug console
  serial: 8250: fix panic due to PSLVERR
  media: uvcvideo: Do not mark valid metadata as invalid
  media: uvcvideo: Fix 1-byte out-of-bounds read in uvc_parse_format()
  btrfs: fix log tree replay failure due to file with 0 links and extents
  thunderbolt: Fix copy+paste error in match_service_id()
  misc: rtsx: usb: Ensure mmc child device is active when card is present
  scsi: lpfc: Remove redundant assignment to avoid memory leak
  rtc: ds1307: remove clear of oscillator stop flag (OSF) in probe
  pNFS: Fix uninited ptr deref in block/scsi layout
  pNFS: Fix disk addr range check in block/scsi layout
  pNFS: Fix stripe mapping in block/scsi layout
  ipmi: Fix strcpy source and destination the same
  kconfig: lxdialog: fix 'space' to (de)select options
  kconfig: gconf: fix potential memory leak in renderer_edited()
  kconfig: gconf: avoid hardcoding model2 in on_treeview2_cursor_changed()
  scsi: aacraid: Stop using PCI_IRQ_AFFINITY
  scsi: Fix sas_user_scan() to handle wildcard and multi-channel scans
  kconfig: nconf: Ensure null termination where strncpy is used
  kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
  PCI: pnv_php: Work around switches with broken presence detection
  media: uvcvideo: Fix bandwidth issue for Alcor camera
  media: dvb-frontends: w7090p: fix null-ptr-deref in w7090p_tuner_write_serpar and w7090p_tuner_read_serpar
  media: dvb-frontends: dib7090p: fix null-ptr-deref in dib7090p_rw_on_apb()
  media: usb: hdpvr: disable zero-length read messages
  media: tc358743: Increase FIFO trigger level to 374
  media: tc358743: Return an appropriate colorspace from tc358743_set_fmt
  media: tc358743: Check I2C succeeded during probe
  pinctrl: stm32: Manage irq affinity settings
  scsi: mpt3sas: Correctly handle ATA device errors
  RDMA: hfi1: fix possible divide-by-zero in find_hw_thread_mask()
  MIPS: Don't crash in stack_top() for tasks without ABI or vDSO
  jfs: upper bound check of tree index in dbAllocAG
  jfs: Regular file corruption check
  jfs: truncate good inode pages when hard link is 0
  scsi: bfa: Double-free fix
  MIPS: vpe-mt: add missing prototypes for vpe_{alloc,start,stop,free}
  watchdog: dw_wdt: Fix default timeout
  fs/orangefs: use snprintf() instead of sprintf()
  scsi: libiscsi: Initialize iscsi_conn->dd_data only if memory is allocated
  ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
  vhost: fail early when __vhost_add_used() fails
  uapi: in6: restore visibility of most IPv6 socket options
  net: ncsi: Fix buffer overflow in fetching version id
  net: dsa: b53: fix b53_imp_vlan_setup for BCM5325
  net: vlan: Replace BUG() with WARN_ON_ONCE() in vlan_dev_* stubs
  wifi: iwlegacy: Check rate_idx range after addition
  netmem: fix skb_frag_address_safe with unreadable skbs
  wifi: rtlwifi: fix possible skb memory leak in `_rtl_pci_rx_interrupt()`.
  wifi: iwlwifi: dvm: fix potential overflow in rs_fill_link_cmd()
  net: fec: allow disable coalescing
  (powerpc/512) Fix possible `dma_unmap_single()` on uninitialized pointer
  s390/stp: Remove udelay from stp_sync_clock()
  wifi: iwlwifi: mvm: fix scan request validation
  net: thunderx: Fix format-truncation warning in bgx_acpi_match_id()
  net: ipv4: fix incorrect MTU in broadcast routes
  wifi: cfg80211: Fix interface type validation
  et131x: Add missing check after DMA map
  be2net: Use correct byte order and format string for TCP seq and ack_seq
  s390/time: Use monotonic clock in get_cycles()
  wifi: cfg80211: reject HTC bit for management frames
  ktest.pl: Prevent recursion of default variable options
  ASoC: codecs: rt5640: Retry DEVICE_ID verification
  ALSA: usb-audio: Avoid precedence issues in mixer_quirks macros
  ALSA: hda/ca0132: Fix buffer overflow in add_tuning_control
  platform/x86: thinkpad_acpi: Handle KCOV __init vs inline mismatches
  pm: cpupower: Fix the snapshot-order of tsc,mperf, clock in mperf_stop()
  ALSA: intel8x0: Fix incorrect codec index usage in mixer for ICH4
  ASoC: hdac_hdmi: Rate limit logging on connection and disconnection
  mmc: rtsx_usb_sdmmc: Fix error-path in sd_set_power_mode()
  ACPI: processor: fix acpi_object initialization
  PM: sleep: console: Fix the black screen issue
  thermal: sysfs: Return ENODATA instead of EAGAIN for reads
  selftests: tracing: Use mutex_unlock for testing glob filter
  ARM: tegra: Use I/O memcpy to write to IRAM
  gpio: tps65912: check the return value of regmap_update_bits()
  ASoC: soc-dapm: set bias_level if snd_soc_dapm_set_bias_level() was successed
  cpufreq: Exit governor when failed to start old governor
  usb: xhci: Avoid showing errors during surprise removal
  usb: xhci: Set avg_trb_len = 8 for EP0 during Address Device Command
  usb: xhci: Avoid showing warnings for dying controller
  selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
  usb: xhci: print xhci->xhc_state when queue_command failed
  securityfs: don't pin dentries twice, once is enough...
  hfs: fix not erasing deleted b-tree node issue
  drbd: add missing kref_get in handle_write_conflicts
  arm64: Handle KCOV __init vs inline mismatches
  hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file()
  hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
  hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read()
  hfs: fix slab-out-of-bounds in hfs_bnode_read()
  sctp: linearize cloned gso packets in sctp_rcv
  netfilter: ctnetlink: fix refcount leak on table dump
  udp: also consider secpath when evaluating ipsec use for checksumming
  fs: Prevent file descriptor table allocations exceeding INT_MAX
  sunvdc: Balance device refcount in vdc_port_mpgroup_check
  NFSD: detect mismatch of file handle and delegation stateid in OPEN op
  net: dpaa: fix device leak when querying time stamp info
  net: gianfar: fix device leak when querying time stamp info
  netlink: avoid infinite retry looping in netlink_unicast()
  ALSA: usb-audio: Validate UAC3 cluster segment descriptors
  ALSA: usb-audio: Validate UAC3 power domain descriptors, too
  usb: gadget : fix use-after-free in composite_dev_cleanup()
  MIPS: mm: tlb-r4k: Uniquify TLB entries on init
  USB: serial: option: add Foxconn T99W709
  vsock: Do not allow binding to VMADDR_PORT_ANY
  net/packet: fix a race in packet_set_ring() and packet_notifier()
  perf/core: Prevent VMA split of buffer mappings
  perf/core: Exit early on perf_mmap() fail
  perf/core: Don't leak AUX buffer refcount on allocation failure
  pptp: fix pptp_xmit() error path
  smb: client: let recv_done() cleanup before notifying the callers.
  benet: fix BUG when creating VFs
  ipv6: reject malicious packets in ipv6_gso_segment()
  pptp: ensure minimal skb length in pptp_xmit()
  netpoll: prevent hanging NAPI when netcons gets enabled
  NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
  pci/hotplug/pnv-php: Wrap warnings in macro
  pci/hotplug/pnv-php: Improve error msg on power state change failure
  usb: chipidea: udc: fix sleeping function called from invalid context
  f2fs: fix to avoid out-of-boundary access in devs.path
  f2fs: fix to avoid UAF in f2fs_sync_inode_meta()
  rtc: pcf8563: fix incorrect maximum clock rate handling
  rtc: hym8563: fix incorrect maximum clock rate handling
  rtc: ds1307: fix incorrect maximum clock rate handling
  mtd: rawnand: atmel: set pmecc data setup time
  mtd: rawnand: atmel: Fix dma_mapping_error() address
  jfs: fix metapage reference count leak in dbAllocCtl
  fbdev: imxfb: Check fb_add_videomode to prevent null-ptr-deref
  crypto: qat - fix seq_file position update in adf_ring_next()
  dmaengine: nbpfaxi: Add missing check after DMA map
  dmaengine: mv_xor: Fix missing check after DMA map and missing unmap
  fs/orangefs: Allow 2 more characters in do_c_string()
  crypto: img-hash - Fix dma_unmap_sg() nents value
  scsi: isci: Fix dma_unmap_sg() nents value
  scsi: mvsas: Fix dma_unmap_sg() nents value
  scsi: ibmvscsi_tgt: Fix dma_unmap_sg() nents value
  perf tests bp_account: Fix leaked file descriptor
  crypto: ccp - Fix crash when rebind ccp device for ccp.ko
  pinctrl: sunxi: Fix memory leak on krealloc failure
  power: supply: max14577: Handle NULL pdata when CONFIG_OF is not set
  clk: davinci: Add NULL check in davinci_lpsc_clk_register()
  mtd: fix possible integer overflow in erase_xfer()
  crypto: marvell/cesa - Fix engine load inaccuracy
  PCI: rockchip-host: Fix "Unexpected Completion" log message
  vrf: Drop existing dst reference in vrf_ip6_input_dst
  netfilter: xt_nfacct: don't assume acct name is null-terminated
  can: kvaser_usb: Assign netdev.dev_port based on device channel index
  wifi: brcmfmac: fix P2P discovery failure in P2P peer due to missing P2P IE
  Reapply "wifi: mac80211: Update skb's control block key in ieee80211_tx_dequeue()"
  mwl8k: Add missing check after DMA map
  wifi: rtl8xxxu: Fix RX skb size for aggregation disabled
  net/sched: Restrict conditions for adding duplicating netems to qdisc tree
  arch: powerpc: defconfig: Drop obsolete CONFIG_NET_CLS_TCINDEX
  netfilter: nf_tables: adjust lockdep assertions handling
  drm/amd/pm/powerplay/hwmgr/smu_helper: fix order of mask and value
  m68k: Don't unregister boot console needlessly
  tcp: fix tcp_ofo_queue() to avoid including too much DUP SACK range
  iwlwifi: Add missing check for alloc_ordered_workqueue
  wifi: iwlwifi: Fix memory leak in iwl_mvm_init()
  wifi: rtl818x: Kill URBs before clearing tx status queue
  caif: reduce stack size, again
  staging: nvec: Fix incorrect null termination of battery manufacturer
  samples: mei: Fix building on musl libc
  usb: early: xhci-dbc: Fix early_ioremap leak
  Revert "vmci: Prevent the dispatching of uninitialized payloads"
  pps: fix poll support
  vmci: Prevent the dispatching of uninitialized payloads
  staging: fbtft: fix potential memory leak in fbtft_framebuffer_alloc()
  ARM: dts: vfxxx: Correctly use two tuples for timer address
  ASoC: ops: dynamically allocate struct snd_ctl_elem_value
  hfsplus: remove mutex_lock check in hfsplus_free_extents
  ASoC: Intel: fix SND_SOC_SOF dependencies
  ethernet: intel: fix building with large NR_CPUS
  usb: phy: mxs: disconnect line when USB charger is attached
  usb: chipidea: udc: protect usb interrupt enable
  usb: chipidea: udc: add new API ci_hdrc_gadget_connect
  comedi: comedi_test: Fix possible deletion of uninitialized timers
  nilfs2: reject invalid file types when reading inodes
  i2c: qup: jump out of the loop in case of timeout
  net/sched: sch_qfq: Avoid triggering might_sleep in atomic context in qfq_delete_class
  net: appletalk: Fix use-after-free in AARP proxy probe
  net: appletalk: fix kerneldoc warnings
  RDMA/core: Rate limit GID cache warning messages
  usb: hub: fix detection of high tier USB3 devices behind suspended hubs
  net_sched: sch_sfq: reject invalid perturb period
  net_sched: sch_sfq: move the limit validation
  net_sched: sch_sfq: use a temporary work area for validating configuration
  net_sched: sch_sfq: don't allow 1 packet limit
  net_sched: sch_sfq: handle bigger packets
  net_sched: sch_sfq: annotate data-races around q->perturb_period
  power: supply: bq24190_charger: Fix runtime PM imbalance on error
  xhci: Disable stream for xHC controller with XHCI_BROKEN_STREAMS
  virtio-net: ensure the received length does not exceed allocated size
  usb: dwc3: qcom: Don't leave BCR asserted
  usb: musb: fix gadget state on disconnect
  net/sched: Return NULL when htb_lookup_leaf encounters an empty rbtree
  net: vlan: fix VLAN 0 refcount imbalance of toggling filtering during runtime
  Bluetooth: L2CAP: Fix attempting to adjust outgoing MTU
  Bluetooth: SMP: Fix using HCI_ERROR_REMOTE_USER_TERM on timeout
  Bluetooth: SMP: If an unallowed command is received consider it a failure
  Bluetooth: Fix null-ptr-deref in l2cap_sock_resume_cb()
  usb: net: sierra: check for no status endpoint
  net/sched: sch_qfq: Fix race condition on qfq_aggregate
  net: emaclite: Fix missing pointer increment in aligned_read()
  comedi: Fix use of uninitialized data in insn_rw_emulate_bits()
  comedi: Fix some signed shift left operations
  comedi: das6402: Fix bit shift out of bounds
  comedi: das16m1: Fix bit shift out of bounds
  comedi: aio_iiro_16: Fix bit shift out of bounds
  comedi: pcl812: Fix bit shift out of bounds
  iio: adc: max1363: Reorder mode_list[] entries
  iio: adc: max1363: Fix MAX1363_4X_CHANS/MAX1363_8X_CHANS[]
  soc: aspeed: lpc-snoop: Don't disable channels that aren't enabled
  soc: aspeed: lpc-snoop: Cleanup resources in stack-order
  mmc: sdhci-pci: Quirk for broken command queuing on Intel GLK-based Positivo models
  memstick: core: Zero initialize id_reg in h_memstick_read_dev_id()
  isofs: Verify inode mode when loading from disk
  dmaengine: nbpfaxi: Fix memory corruption in probe()
  af_packet: fix soft lockup issue caused by tpacket_snd()
  af_packet: fix the SO_SNDTIMEO constraint not effective on tpacked_snd()
  phonet/pep: Move call to pn_skb_get_dst_sockaddr() earlier in pep_sock_accept()
  HID: core: do not bypass hid_hw_raw_request
  HID: core: ensure __hid_request reserves the report ID as the first byte
  HID: core: ensure the allocated report buffer can contain the reserved report ID
  pch_uart: Fix dma_sync_sg_for_device() nents value
  Input: xpad - set correct controller type for Acer NGR200
  i2c: stm32: fix the device used for the DMA map
  usb: gadget: configfs: Fix OOB read on empty string write
  USB: serial: ftdi_sio: add support for NDI EMGUIDE GEMINI
  USB: serial: option: add Foxconn T99W640
  USB: serial: option: add Telit Cinterion FE910C04 (ECM) composition
  dma-mapping: add generic helpers for mapping sgtable objects
  usb: renesas_usbhs: Flush the notify_hotplug_work
  gpio: rcar: Use raw_spinlock to protect register access

Change-Id: Ia6b8b00918487999c648f298d3550afc7eaaae03
Signed-off-by: bengris32 <bengris32@protonmail.ch>
2025-10-12 13:39:56 +01:00
Li Li
f971b42a63 BACKPORT: FROMGIT: binder: fix freeze race
PROPAGATED_from (CR)

Currently cgroup freezer is used to freeze the application threads, and
BINDER_FREEZE is used to freeze the corresponding binder interface.
There's already a mechanism in ioctl(BINDER_FREEZE) to wait for any
existing transactions to drain out before actually freezing the binder
interface.

But freezing an app requires 2 steps, freezing the binder interface with
ioctl(BINDER_FREEZE) and then freezing the application main threads with
cgroupfs. This is not an atomic operation. The following race issue
might happen.

1) Binder interface is frozen by ioctl(BINDER_FREEZE);
2) Main thread A initiates a new sync binder transaction to process B;
3) Main thread A is frozen by "echo 1 > cgroup.freeze";
4) The response from process B reaches the frozen thread, which will
unexpectedly fail.

This patch provides a mechanism to check if there's any new pending
transaction happening between ioctl(BINDER_FREEZE) and freezing the
main thread. If there's any, the main thread freezing operation can
be rolled back to finish the pending transaction.

Furthermore, the response might reach the binder driver before the
rollback actually happens. That will still cause failed transaction.

As the other process doesn't wait for another response of the response,
the response transaction failure can be fixed by treating the response
transaction l(CR) an oneway/async one, allowing it to reach the frozen
thread. And it will be consumed when the thread gets unfrozen later.

NOTE: This patch reuses the existing definition of struct
binder_frozen_status_info but expands the bit assignments of __u32
member sync_recv.

To ensure backward compatibility, bit 0 of sync_recv still indicates
there's an outstanding sync binder transaction. This patch adds new
information to bit 1 of sync_recv, indicating the binder transaction
happens exactly when there's a race.

If an existing userspace app runs on a new kernel, a sync binder call
will set bit 0 of sync_recv so ioctl(BINDER_GET_FROZEN_INFO) still
return the expected value (true). The app just doesn't check bit 1
intentionally so it doesn't have the ability to tell if there's a race.
This behavior is aligned with what happens on an old kernel which
doesn't set bit 1 at all.

A new userspace app can 1) check bit 0 to know if there's a sync binder
transaction happened when being frozen - same as before; and 2) check
bit 1 to know if that sync binder transaction happened exactly when
there's a race - a new information for rollback decision.

Fixes: 432ff1e91694 ("binder: BINDER_FREEZE ioctl")
Acked-by: Todd Kjos <tkjos@google.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Li Li <dualli@google.com>
Test: stress test with apps being frozen and initiating binder calls at
the same time, confirmed the pending transactions succeeded.
Link: https://lore.kernel.org/r/20210910164210.2282716-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 198493121
(cherry picked from commit b564171ade70570b7f335fa8ed17adb28409e3ac
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
 char-misc-linus)
Change-Id: I488ba75056f18bb3094ba5007027b76b5caebec9
Reviewed-on: https://gerrit.mot.com/2496991
SME-Granted: SME Approvals Granted
SLTApproved: Slta Waiver
Tested-by: Jira Key
Reviewed-by: Zhangqing Huang <huangzq2@motorola.com>
Reviewed-by: Hua Tan <tanhua1@motorola.com>
Submit-Approved: Jira Key
Signed-off-by: Levy Gabriel da Silva Galvao <levy@motorola.com>
Reviewed-on: https://gerrit.mot.com/2704333
Reviewed-by: Xiangpo Zhao <zhaoxp3@motorola.com>
Reviewed-by: Rafael Ortolan <rafones@motorola.com>
2025-09-20 03:23:43 +01:00
Hang Lu
a6665f1270 BACKPORT: binder: tell userspace to dump current backtrace when detected oneway spamming
When async binder buffer got exhausted, some normal oneway transactions
will also be discarded and may cause system or application failures. By
that time, the binder debug information we dump may not be relevant to
the root cause. And this issue is difficult to debug if without the
backtrace of the thread sending spam.

This change will send BR_ONEWAY_SPAM_SUSPECT to userspace when oneway
spamming is detected, request to dump current backtrace. Oneway spamming
will be reported only once when exceeding the threshold (target process
dips below 80% of its oneway space, and current process is responsible
for either more than 50 transactions, or more than 50% of the oneway
space). And the detection will restart when the async buffer has
returned to a healthy state.

Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Hang Lu <hangl@codeaurora.org>
Link: https://lore.kernel.org/r/1617961246-4502-3-git-send-email-hangl@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 181190340
Change-Id: Id3d2526099bc89f04d8ad3ad6e48141b2a8f2515
(cherry picked from commit a7dc1e6f99df59799ab0128d9c4e47bbeceb934d)
Signed-off-by: Hang Lu <hangl@codeaurora.org>
2025-09-20 03:23:43 +01:00
Marco Ballesio
ea9b0c2ceb BACKPORT: FROMGIT: binder: BINDER_GET_FROZEN_INFO ioctl
User space needs to know if binder transactions occurred to frozen
processes. Introduce a new BINDER_GET_FROZEN ioctl and keep track of
transactions occurring to frozen proceses.

Bug: 180989544
(cherry picked from commit c55019c24b22d6770bd8e2f12fbddf3f83d37547
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Li Li <dualli@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-4-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ie631f331ba4ca94a3bcdd43dec25fe9ba1306af2
2025-09-20 03:23:42 +01:00
Marco Ballesio
043fc3d835 BACKPORT: FROMGIT: binder: BINDER_FREEZE ioctl
Frozen tasks can't process binder transactions, so a way is required to
inform transmitting ends of communication failures due to the frozen
state of their receiving counterparts. Additionally, races are possible
between transitions to frozen state and binder transactions enqueued to
a specific process.

Implement BINDER_FREEZE ioctl for user space to inform the binder driver
about the intention to freeze or unfreeze a process. When the ioctl is
called, block the caller until any pending binder transactions toward
the target process are flushed. Return an error to transactions to
processes marked as frozen.

Bug: 180989544
(cherry picked from commit 15949c3cdd97bccdcd45c0c0f6c31058520b6494
 git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing)
Co-developed-by: Todd Kjos <tkjos@google.com>
Acked-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Li Li <dualli@google.com>
Link: https://lore.kernel.org/r/20210316011630.1121213-2-dualli@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ia1b5951cd99eeb98b59e06c3e27d59062dc725f6
2025-09-20 03:23:42 +01:00
Todd Kjos
843efa9c8e UPSTREAM: binder: add flag to clear buffer on txn complete
Add a per-transaction flag to indicate that the buffer
must be cleared when the transaction is complete to
prevent copies of sensitive data from being preserved
in memory.

Signed-off-by: Todd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20201120233743.3617529-1-tkjos@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bug: 171501513
Change-Id: Ic9338c85cbe3b11ab6f2bda55dce9964bb48447a
(cherry picked from commit 0f966cba95c78029f491b433ea95ff38f414a761)
Signed-off-by: Todd Kjos <tkjos@google.com>
2025-09-20 03:23:42 +01:00
David Howells
b99a3a1e82 UPSTREAM: zsfold: Convert zsfold to use the new mount API
Convert the zsfold filesystem to the new internal mount API as the old one
will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Change-Id: Ia3da9e9fd2ef5214c4e7fb71228f8b90c15f9e71
Signed-off-by: David Howells <dhowells@redhat.com>
2025-09-20 03:23:34 +01:00
Jordan Rome
4029fa4d85 UPSTREAM: bpf: Add crosstask check to __bpf_get_stack
[ Upstream commit b8e3a87a627b575896e448021e5c2f8a3bc19931 ]

Currently get_perf_callchain only supports user stack walking for
the current task. Passing the correct *crosstask* param will return
0 frames if the task passed to __bpf_get_stack isn't the current
one instead of a single incorrect frame/address. This change
passes the correct *crosstask* param but also does a preemptive
check in __bpf_get_stack if the task is current and returns
-EOPNOTSUPP if it is not.

This issue was found using bpf_get_task_stack inside a BPF
iterator ("iter/task"), which iterates over all tasks.
bpf_get_task_stack works fine for fetching kernel stacks
but because get_perf_callchain relies on the caller to know
if the requested *task* is the current one (via *crosstask*)
it was failing in a confusing way.

It might be possible to get user stacks for all tasks utilizing
something like access_process_vm but that requires the bpf
program calling bpf_get_task_stack to be sleepable and would
therefore be a breaking change.

Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()")
Change-Id: Iaaa9eecc1e9f9c4c7b4953e169725d1aa1808bc3
Signed-off-by: Jordan Rome <jordalgo@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231108112334.3433136-1-jordalgo@meta.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-20 03:23:03 +01:00
Stanislav Fomichev
d0c2b52274 UPSTREAM: bpf: Clarify error expectations from bpf_clone_redirect
[ Upstream commit 7cb779a6867fea00b4209bcf6de2f178a743247d ]

Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped
packets") exposed the fact that bpf_clone_redirect is capable of
returning raw NET_XMIT_XXX return codes.

This is in the conflict with its UAPI doc which says the following:
"0 on success, or a negative error in case of failure."

Update the UAPI to reflect the fact that bpf_clone_redirect can
return positive error numbers, but don't explicitly define
their meaning.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Change-Id: I5b81b2b05cb8369feceae99f9fddaaf6a0121dd1
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-20 03:23:03 +01:00
Hengqi Chen
08fbe6d9cd UPSTREAM: bpf: Fix comment for helper bpf_current_task_under_cgroup()
commit 58617014405ad5c9f94f464444f4972dabb71ca7 upstream.

Fix the descriptions of the return values of helper bpf_current_task_under_cgroup().

Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Change-Id: I312011040d0f09d09834613e418caa2f86be548d
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-20 03:22:55 +01:00
Namhyung Kim
03e31e2955 UPSTREAM: bpf: Adjust BPF stack helper functions to accommodate skip > 0
commit ee2a098851bfbe8bcdd964c0121f4246f00ff41e upstream.

Let's say that the caller has storage for num_elem stack frames.  Then,
the BPF stack helper functions walk the stack for only num_elem frames.
This means that if skip > 0, one keeps only 'num_elem - skip' frames.

This is because it sets init_nr in the perf_callchain_entry to the end
of the buffer to save num_elem entries only.  I believe it was because
the perf callchain code unwound the stack frames until it reached the
global max size (sysctl_perf_event_max_stack).

However it now has perf_callchain_entry_ctx.max_stack to limit the
iteration locally.  This simplifies the code to handle init_nr in the
BPF callstack entries and removes the confusion with the perf_event's
__PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0.

Also change the comment on bpf_get_stack() in the header file to be
more explicit what the return value means.

Fixes: c195651e56 ("bpf: add bpf_get_stack helper")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com
Link: https://lore.kernel.org/bpf/20220314182042.71025-1-namhyung@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
Change-Id: If8b7c2f7705a2c5fc27208d472a4762b7525bfdd
2025-09-20 03:22:55 +01:00
Kuniyuki Iwashima
55b9335242 UPSTREAM: bpf: Fix a typo of reuseport map in bpf.h.
[ Upstream commit f170acda7ffaf0473d06e1e17c12cd9fd63904f5 ]

Fix s/BPF_MAP_TYPE_REUSEPORT_ARRAY/BPF_MAP_TYPE_REUSEPORT_SOCKARRAY/ typo
in bpf.h.

Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Change-Id: Id7f85662cfa1f3ae153c4da156730c440f957e84
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210714124317.67526-1-kuniyu@amazon.co.jp
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-20 03:22:52 +01:00
Andrii Nakryiko
b567486e24 UPSTREAM: bpf: Fix enum names for bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers
Remove bpf_ prefix, which causes these helpers to be reported in verifier
dump as bpf_bpf_this_cpu_ptr() and bpf_bpf_per_cpu_ptr(), respectively. Lets
fix it as long as it is still possible before UAPI freezes on these helpers.

Fixes: eaa6bcb71ef6 ("bpf: Introduce bpf_per_cpu_ptr()")
Change-Id: I67e5dde44023de422ec093935ce20647466db4b8
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-20 03:22:42 +01:00
Toke Høiland-Jørgensen
5ccc906068 UPSTREAM: bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply
be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen.

  [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Change-Id: I2c24b13263ccc6452023c6f76e635ab2114ce142
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
2025-09-20 03:22:41 +01:00
Daniel Borkmann
00b2b0959f UPSTREAM: bpf: Allow for map-in-map with dynamic inner array map entries
Recent work in f4d05259213f ("bpf: Add map_meta_equal map ops") and 134fede4eecf
("bpf: Relax max_entries check for most of the inner map types") added support
for dynamic inner max elements for most map-in-map types. Exceptions were maps
like array or prog array where the map_gen_lookup() callback uses the maps'
max_entries field as a constant when emitting instructions.

We recently implemented Maglev consistent hashing into Cilium's load balancer
which uses map-in-map with an outer map being hash and inner being array holding
the Maglev backend table for each service. This has been designed this way in
order to reduce overall memory consumption given the outer hash map allows to
avoid preallocating a large, flat memory area for all services. Also, the
number of service mappings is not always known a-priori.

The use case for dynamic inner array map entries is to further reduce memory
overhead, for example, some services might just have a small number of back
ends while others could have a large number. Right now the Maglev backend table
for small and large number of backends would need to have the same inner array
map entries which adds a lot of unneeded overhead.

Dynamic inner array map entries can be realized by avoiding the inlined code
generation for their lookup. The lookup will still be efficient since it will
be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
inline code generation and relaxes array_map_meta_equal() check to ignore both
maps' max_entries. This also still allows to have faster lookups for map-in-map
when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.

Example code generation where inner map is dynamic sized array:

  # bpftool p d x i 125
  int handle__sys_enter(void * ctx):
  ; int handle__sys_enter(void *ctx)
     0: (b4) w1 = 0
  ; int key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r2 = r10
  ;
     3: (07) r2 += -4
  ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
     4: (18) r1 = map[id:468]
     6: (07) r1 += 272
     7: (61) r0 = *(u32 *)(r2 +0)
     8: (35) if r0 >= 0x3 goto pc+5
     9: (67) r0 <<= 3
    10: (0f) r0 += r1
    11: (79) r0 = *(u64 *)(r0 +0)
    12: (15) if r0 == 0x0 goto pc+1
    13: (05) goto pc+1
    14: (b7) r0 = 0
    15: (b4) w6 = -1
  ; if (!inner_map)
    16: (15) if r0 == 0x0 goto pc+6
    17: (bf) r2 = r10
  ;
    18: (07) r2 += -4
  ; val = bpf_map_lookup_elem(inner_map, &key);
    19: (bf) r1 = r0                               | No inlining but instead
    20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
  ; return val ? *val : -1;                        | for inner array lookup.
    21: (15) if r0 == 0x0 goto pc+1
  ; return val ? *val : -1;
    22: (61) r6 = *(u32 *)(r0 +0)
  ; }
    23: (bc) w0 = w6
    24: (95) exit

Change-Id: Ie3c14796539b261e3fb9433aa9ef46fe116294c4
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
2025-09-20 03:22:40 +01:00
Daniel Borkmann
dbb8fde83f UPSTREAM: bpf: Add redirect_peer helper
Add an efficient ingress to ingress netns switch that can be used out of tc BPF
programs in order to redirect traffic from host ns ingress into a container
veth device ingress without having to go via CPU backlog queue [0]. For local
containers this can also be utilized and path via CPU backlog queue only needs
to be taken once, not twice. On a high level this borrows from ipvlan which does
similar switch in __netif_receive_skb_core() and then iterates via another_round.
This helps to reduce latency for mentioned use cases.

Pod to remote pod with redirect(), TCP_RR [1]:

  # percpu_netperf 10.217.1.33
          RT_LATENCY:         122.450         (per CPU:         122.666         122.401         122.333         122.401 )
        MEAN_LATENCY:         121.210         (per CPU:         121.100         121.260         121.320         121.160 )
      STDDEV_LATENCY:         120.040         (per CPU:         119.420         119.910         125.460         115.370 )
         MIN_LATENCY:          46.500         (per CPU:          47.000          47.000          47.000          45.000 )
         P50_LATENCY:         118.500         (per CPU:         118.000         119.000         118.000         119.000 )
         P90_LATENCY:         127.500         (per CPU:         127.000         128.000         127.000         128.000 )
         P99_LATENCY:         130.750         (per CPU:         131.000         131.000         129.000         132.000 )

    TRANSACTION_RATE:       32666.400         (per CPU:        8152.200        8169.842        8174.439        8169.897 )

Pod to remote pod with redirect_peer(), TCP_RR:

  # percpu_netperf 10.217.1.33
          RT_LATENCY:          44.449         (per CPU:          43.767          43.127          45.279          45.622 )
        MEAN_LATENCY:          45.065         (per CPU:          44.030          45.530          45.190          45.510 )
      STDDEV_LATENCY:          84.823         (per CPU:          66.770          97.290          84.380          90.850 )
         MIN_LATENCY:          33.500         (per CPU:          33.000          33.000          34.000          34.000 )
         P50_LATENCY:          43.250         (per CPU:          43.000          43.000          43.000          44.000 )
         P90_LATENCY:          46.750         (per CPU:          46.000          47.000          47.000          47.000 )
         P99_LATENCY:          52.750         (per CPU:          51.000          54.000          53.000          53.000 )

    TRANSACTION_RATE:       90039.500         (per CPU:       22848.186       23187.089       22085.077       21919.130 )

  [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
  [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperf

Change-Id: I17d75ffbb776ea4e36326b8fdd04b71441a1982b
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
2025-09-20 03:22:40 +01:00
Daniel Borkmann
bba8615766 UPSTREAM: bpf: Improve bpf_redirect_neigh helper description
Follow-up to address David's feedback that we should better describe internals
of the bpf_redirect_neigh() helper.

Suggested-by: David Ahern <dsahern@gmail.com>
Change-Id: I846bfce6bbcbce8bc561501be8f86cfba4810ca5
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
2025-09-20 03:22:35 +01:00
Nikita V. Shirokov
ead75c1fa0 UPSTREAM: bpf: Add tcp_notsent_lowat bpf setsockopt
Adding support for TCP_NOTSENT_LOWAT sockoption (https://lwn.net/Articles/560082/)
in tcp bpf programs.

Change-Id: I8edbc4662cbf7e31026c073ab628f8d28f859bef
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201009070325.226855-1-tehnerd@tehnerd.com
2025-09-20 03:22:35 +01:00
Jakub Wilk
a897a2775d UPSTREAM: bpf: Fix typo in uapi/linux/bpf.h
Reported-by: Samanta Navarro <ferivoz@riseup.net>
Change-Id: I2b201793f7c385f8581dcff502186e518705a420
Signed-off-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20201007055717.7319-1-jwilk@jwilk.net
2025-09-20 03:22:34 +01:00
Hao Luo
57ae737d3c UPSTREAM: bpf: Introducte bpf_this_cpu_ptr()
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This
helper always returns a valid pointer, therefore no need to check
returned value for NULL. Also note that all programs run with
preemption disabled, which means that the returned pointer is stable
during all the execution of the program.

Change-Id: Idd23453d28e221430cc067c6b6f812eab0ac738d
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
2025-09-20 03:22:32 +01:00
Hao Luo
dca7b0be14 UPSTREAM: bpf: Introduce bpf_per_cpu_ptr()
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars.
bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel
except that it may return NULL. This happens when the cpu parameter is
out of range. So the caller must check the returned value.

Change-Id: I254209a7070abc34458020e4ee8ee73256e31886
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
2025-09-20 03:22:32 +01:00
Hao Luo
fae103ddcc UPSTREAM: bpf: Introduce pseudo_btf_id
Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a
ksym so that further dereferences on the ksym can use the BTF info
to validate accesses. Internally, when seeing a pseudo_btf_id ld insn,
the verifier reads the btf_id stored in the insn[0]'s imm field and
marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND,
which is encoded in btf_vminux by pahole. If the VAR is not of a struct
type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID
and the mem_size is resolved to the size of the VAR's type.

>From the VAR btf_id, the verifier can also read the address of the
ksym's corresponding kernel var from kallsyms and use that to fill
dst_reg.

Therefore, the proper functionality of pseudo_btf_id depends on (1)
kallsyms and (2) the encoding of kernel global VARs in pahole, which
should be available since pahole v1.18.

Change-Id: I8e81d1d9c2ed4c669e5f68fcf4246b9c1c705bb6
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com
2025-09-20 03:22:32 +01:00
Song Liu
d45b6b182b UPSTREAM: bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array
Currently, perf event in perf event array is removed from the array when
the map fd used to add the event is closed. This behavior makes it
difficult to the share perf events with perf event array.

Introduce perf event map that keeps the perf event open with a new flag
BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not
removed when the original map fd is closed. Instead, the perf event will
stay in the map until 1) it is explicitly removed from the array; or 2)
the array is freed.

Change-Id: I7c0aaa5ae7431439acaaaaf7b5b689834b2313db
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com
2025-09-20 03:22:31 +01:00
Daniel Borkmann
f256187033 UPSTREAM: bpf: Add redirect_neigh helper as redirect drop-in
Add a redirect_neigh() helper as redirect() drop-in replacement
for the xmit side. Main idea for the helper is to be very similar
in semantics to the latter just that the skb gets injected into
the neighboring subsystem in order to let the stack do the work
it knows best anyway to populate the L2 addresses of the packet
and then hand over to dev_queue_xmit() as redirect() does.

This solves two bigger items: i) skbs don't need to go up to the
stack on the host facing veth ingress side for traffic egressing
the container to achieve the same for populating L2 which also
has the huge advantage that ii) the skb->sk won't get orphaned in
ip_rcv_core() when entering the IP routing layer on the host stack.

Given that skb->sk neither gets orphaned when crossing the netns
as per 9c4c325252 ("skbuff: preserve sock reference when scrubbing
the skb.") the helper can then push the skbs directly to the phys
device where FQ scheduler can do its work and TCP stack gets proper
backpressure given we hold on to skb->sk as long as skb is still
residing in queues.

With the helper used in BPF data path to then push the skb to the
phys device, I observed a stable/consistent TCP_STREAM improvement
on veth devices for traffic going container -> host -> host ->
container from ~10Gbps to ~15Gbps for a single stream in my test
environment.

Change-Id: I9cca990db0d9091c889bf174a1de1c72e9a8d4de
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/bpf/f207de81629e1724899b73b8112e0013be782d35.1601477936.git.daniel@iogearbox.net
2025-09-20 03:22:31 +01:00
Daniel Borkmann
2008a2e63f UPSTREAM: bpf: Add classid helper only based on skb->sk
Similarly to 5a52ae4e32a6 ("bpf: Allow to retrieve cgroup v1 classid
from v2 hooks"), add a helper to retrieve cgroup v1 classid solely
based on the skb->sk, so it can be used as key as part of BPF map
lookups out of tc from host ns, in particular given the skb->sk is
retained these days when crossing net ns thanks to 9c4c325252
("skbuff: preserve sock reference when scrubbing the skb."). This
is similar to bpf_skb_cgroup_id() which implements the same for v2.
Kubernetes ecosystem is still operating on v1 however, hence net_cls
needs to be used there until this can be dropped in with the v2
helper of bpf_skb_cgroup_id().

Change-Id: Iafb100978944174dd962d8409f229dfd0fd3780e
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/ed633cf27a1c620e901c5aa99ebdefb028dce600.1601477936.git.daniel@iogearbox.net
2025-09-20 03:22:28 +01:00
Toke Høiland-Jørgensen
8894754770 UPSTREAM: bpf: Support attaching freplace programs to multiple attach points
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.

Change-Id: Ifeca634ba0c7ae9b1c5bc3c11a9d2b83fb0b5d23
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
2025-09-20 03:22:28 +01:00
Alan Maguire
c27d926940 UPSTREAM: bpf: Add bpf_seq_printf_btf helper
A helper is added to allow seq file writing of kernel data
structures using vmlinux BTF.  Its signature is

long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
                        u32 btf_ptr_size, u64 flags);

Flags and struct btf_ptr definitions/use are identical to the
bpf_snprintf_btf helper, and the helper returns 0 on success
or a negative error value.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Change-Id: Ief0f9b8b9d9ed5f725159d71d1f3eb26f28c27c1
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
2025-09-20 03:22:27 +01:00
Alan Maguire
af684854b7 UPSTREAM: bpf: Add bpf_snprintf_btf helper
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF).  Its signature is

long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 btf_ptr_size, u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
  are not to be confused with the BTF_F_* flags
  below that control how the btf_ptr is displayed; the
  flags member of the struct btf_ptr may be used to
  disambiguate types in kernel versus module BTF, etc;
  the main distinction is the flags relate to the type
  and information needed in identifying it; not how it
  is displayed.

For example a BPF program with a struct sk_buff *skb
could do the following:

	static struct btf_ptr b = { };

	b.ptr = skb;
	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

Default output looks like this:

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)0x000000007524fd8b,
 .data = (unsigned char *)0x000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Flags modifying display are as follows:

- BTF_F_COMPACT:	no formatting around type information
- BTF_F_NONAME:		no struct/union member names/types
- BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
			equivalent to %px.
- BTF_F_ZERO:		show zero-valued struct/union members;
			they are not displayed by default

Change-Id: I77f2c2a0d41aee2f4f10e3288a36de475fd2cb46
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
2025-09-20 03:22:27 +01:00
Song Liu
e50eb80d29 UPSTREAM: bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Change-Id: Id8f992caff30d7b65df8195f8934bcb2d8b658cb
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
2025-09-20 03:22:25 +01:00
Martin KaFai Lau
dc04cfd580 UPSTREAM: bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_assign() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL".  Meaning
it specifically takes a literal NULL.  ARG_PTR_TO_BTF_ID_SOCK_COMMON
does not allow a literal NULL, so another ARG type is required
for this purpose and another follow-up patch can be used if
there is such need.

Change-Id: Ia2747b017398c8dcc4454118dba03e71bdeba1dc
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com
2025-09-20 03:22:24 +01:00
Martin KaFai Lau
d7b625294e UPSTREAM: bpf: Change bpf_tcp_*_syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_tcp_*_syncookie() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

Change-Id: I4f7712acada428dab29b94f0cb9bbbb19d9eaea2
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com
2025-09-20 03:22:24 +01:00
Martin KaFai Lau
36304c0284 UPSTREAM: bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get().  It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.

The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.

Change-Id: Ifef1303f11952714d181a3c369e154a70a864fd0
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
2025-09-20 03:22:24 +01:00
Martin KaFai Lau
01c44a76c3 UPSTREAM: bpf: Change bpf_sk_release and bpf_sk_*cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
The previous patch allows the networking bpf prog to use the
bpf_skc_to_*() helpers to get a PTR_TO_BTF_ID socket pointer,
e.g. "struct tcp_sock *".  It allows the bpf prog to read all the
fields of the tcp_sock.

This patch changes the bpf_sk_release() and bpf_sk_*cgroup_id()
to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will
work with the pointer returned by the bpf_skc_to_*() helpers
also.  For example, the following will work:

	sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0);
	if (!sk)
		return;
	tp = bpf_skc_to_tcp_sock(sk);
	if (!tp) {
		bpf_sk_release(sk);
		return;
	}
	lsndtime = tp->lsndtime;
	/* Pass tp to bpf_sk_release() will also work */
	bpf_sk_release(tp);

Since PTR_TO_BTF_ID could be NULL, the helper taking
ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime.

A btf_id of "struct sock" may not always mean a fullsock.  Regardless
the helper's running context may get a non-fullsock or not,
considering fullsock check/handling is pretty cheap, it is better to
keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID*
will be able to handle the minisock situation.  In the bpf_sk_*cgroup_id()
case,  it will try to get a fullsock by using sk_to_full_sk() as its
skb variant bpf_sk"b"_*cgroup_id() has already been doing.

bpf_sk_release can already handle minisock, so nothing special has to
be done.

Change-Id: Ia9823f535d2a69b6ed4c16f599faa3f8c3ac3a42
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com
2025-09-20 03:22:24 +01:00
YiFei Zhu
38367f1d94 UPSTREAM: bpf: Add BPF_PROG_BIND_MAP syscall
This syscall binds a map to a program. Returns success if the map is
already bound to the program.

Change-Id: I3899904e13ff5cddb35d41ffa9abe67d29a55ca9
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Cc: YiFei Zhu <zhuyifei1999@gmail.com>
Link: https://lore.kernel.org/bpf/20200915234543.3220146-3-sdf@google.com
2025-09-20 03:22:19 +01:00
Song Liu
9d956ac760 UPSTREAM: bpf: Fix comment for helper bpf_current_task_under_cgroup()
This should be "current" not "skb".

Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Change-Id: Ie080b67b2068976abe9056e316008540f8b3db5b
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com
2025-09-20 03:22:18 +01:00
Quentin Monnet
ee86eb3f11 UPSTREAM: bpf: Fix formatting in documentation for BPF helpers
Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man
complains about a wrong indentation, but what is missing is actually a
blank line before the bullet list).

Fix and harmonise the formatting for other helpers.

Change-Id: Ib42eedd569553e5bc7b730a6c598609546c9aca5
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com
2025-09-20 03:22:17 +01:00
Alexei Starovoitov
69e5cce407 UPSTREAM: bpf: Add bpf_copy_from_user() helper.
Sleepable BPF programs can now use copy_from_user() to access user memory.

Change-Id: Ifaf2da741d14dc56b121163d7c486e5c15d1d5a4
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
2025-09-20 03:22:16 +01:00
Alexei Starovoitov
5fae5cfdcb UPSTREAM: bpf: Introduce sleepable BPF programs
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.

The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.

There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.

When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();

Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.

This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.

Change-Id: I281471010348f21de4e0832dfc0265dfe85b4a0d
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2025-09-20 03:22:16 +01:00
Yonghong Song
8e06f1cbab UPSTREAM: bpf: Make bpf_link_info.iter similar to bpf_iter_link_info
bpf_link_info.iter is used by link_query to return bpf_iter_link_info
to user space. Fields may be different, e.g., map_fd vs. map_id, so
we cannot reuse the exact structure. But make them similar, e.g.,

  struct bpf_link_info {
     /* common fields */
     union {
	struct { ... } raw_tracepoint;
	struct { ... } tracing;
	...
	struct {
	    /* common fields for iter */
	    union {
		struct {
		    __u32 map_id;
		} map;
		/* other structs for other targets */
	    };
	};
    };
 };

so the structure is extensible the same way as bpf_iter_link_info.

Fixes: 6b0a249a301e ("bpf: Implement link_query for bpf iterators")
Change-Id: I5bb1481995ec2fa33ce79676b0cfb7594850bbd8
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200828051922.758950-1-yhs@fb.com
2025-09-20 03:22:15 +01:00
Jiri Olsa
b1b48c9c9e UPSTREAM: bpf: Add d_path helper
Adding d_path helper function that returns full path for
given 'struct path' object, which needs to be the kernel
BTF 'path' object. The path is returned in buffer provided
'buf' of size 'sz' and is zero terminated.

  bpf_d_path(&file->f_path, buf, size);

The helper calls directly d_path function, so there's only
limited set of function it can be called from. Adding just
very modest set for the start.

Updating also bpf.h tools uapi header and adding 'path' to
bpf_helpers_doc.py script.

Change-Id: If390ec6189a537b730b9ae595d374cbfa83f6f5b
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200825192124.710397-11-jolsa@kernel.org
2025-09-20 03:22:15 +01:00
KP Singh
12833c5a0e UPSTREAM: bpf: Allow local storage to be used from LSM programs
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used
in LSM programs. These helpers are not used for tracing programs
(currently) as their usage is tied to the life-cycle of the object and
should only be used where the owning object won't be freed (when the
owning object is passed as an argument to the LSM hook). Thus, they
are safer to use in LSM hooks than tracing. Usage of local storage in
tracing programs will probably follow a per function based whitelist
approach.

Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock,
it, leads to a compilation warning for LSM programs, it's also updated
to accept a void * pointer instead.

Change-Id: Ib0afe90a58aa586852d48d13c77bc91cbada9bd1
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org
2025-09-20 03:22:13 +01:00
KP Singh
16dad6b655 BACKPORT: bpf: Implement bpf_local_storage for inodes
Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.

Change-Id: I078273c02202a6ee3837865b6d5b307989eb9bed
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200825182919.1118197-6-kpsingh@chromium.org
2025-09-20 03:22:12 +01:00