* 'linux-4.19.y-cip' of https://git.kernel.org/pub/scm/linux/kernel/git/cip/linux-cip: CIP: Bump version suffix to -cip120 after merge from cip/linux-4.19.y-st tree Update localversion-st, tree is up-to-date with 5.4.292. net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy jfs: add index corruption check to DT_GETPAGE() jfs: fix slab-out-of-bounds read in ea_get() tracing: Fix use-after-free in print_graph_function_flags during tracer switching mmc: sdhci-pxav3: set NEED_RSP_BUSY capability x86/tsc: Always save/restore TSC sched_clock() on suspend/resume ntb_perf: Delete duplicate dmaengine_unmap_put() call in perf_copy_chunk() arcnet: Add NULL check in com20020pci_probe() ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATS vsock: avoid timeout during connect() if the socket is closing net_sched: skbprio: Remove overly strict queue assertions netlabel: Fix NULL pointer exception caused by CALIPSO on IPv4 sockets ntb: intel: Fix using link status DB's ntb_hw_switchtec: Fix shift-out-of-bounds in switchtec_ntb_mw_set_trans spufs: fix a leak in spufs_create_context() spufs: fix a leak on spufs_new_file() failure hwmon: (nct6775-core) Fix out of bounds access for NCT679{8,9} sched/deadline: Use online cpus for validating runtime affs: don't write overlarge OFS data block size fields affs: generate OFS sequence numbers starting at 1 wifi: iwlwifi: fw: allocate chained SG tables for dump sched/smt: Always inline sched_smt_active() ring-buffer: Fix bytes_dropped calculation issue objtool, media: dib8000: Prevent divide-by-zero in dib8000_set_dds() fs/procfs: fix the comment above proc_pid_wchan() perf python: Check if there is space to copy all the event perf python: Decrement the refcount of just created event on failure perf python: Fixup description of sample.id event member ocfs2: validate l_tree_depth to avoid out-of-bounds access perf units: Fix insufficient array space iio: accel: mma8452: Ensure error return on failure to matching oversampling ratio coresight: catu: Fix number of pages while using 64k pages isofs: fix KMSAN uninit-value bug in do_isofs_readdir() x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment mfd: sm501: Switch to BIT() to mitigate integer overflows RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow power: supply: max77693: Fix wrong conversion of charge input threshold value x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1 IB/mad: Check available slots before posting receive WRs clk: rockchip: rk3328: fix wrong clk_ref_usb3otg parent lib: 842: Improve error handling in sw842_compress() clk: amlogic: gxbb: drop incorrect flag on 32k clock fbdev: sm501fb: Add some geometry checks. mdacon: rework dependency list fbdev: au1100fb: Move a variable assignment behind a null pointer check PCI/portdrv: Only disable pciehp interrupts early when needed ALSA: hda/realtek: Always honor no_shutup_pins perf/ring_buffer: Allow the EPOLLRDNORM flag for poll lockdep: Don't disable interrupts on RT in disable_irq_nosync_lockdep.*() thermal: int340x: Add NULL check for adev EDAC/ie31200: Fix the error path order of ie31200_init() EDAC/ie31200: Fix the DIMM size mask for several SoCs x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct() cpufreq: governor: Fix negative 'idle_time' handling in dbs_update() net: usb: usbnet: restore usb%d name exception for local mac addresses net: usb: qmi_wwan: add Telit Cinterion FE990B composition net: usb: qmi_wwan: add Telit Cinterion FN990B composition tty: serial: 8250: Add some more device IDs netfilter: socket: Lookup orig tuple for IPv6 SNAT ARM: 9351/1: fault: Add "cut here" line for prefetch aborts ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed() atm: Fix NULL pointer dereference ALSA: usb-audio: Add quirk for Plantronics headsets to fix control names drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse() batman-adv: Ignore own maximum aggregation size during RX ARM: shmobile: smp: Enforce shmobile_smp_* alignment mmc: atmel-mci: Add missing clk_disable_unprepare() net/neighbor: add missing policy for NDTPA_QUEUE_LENBYTES net: atm: fix use after free in lec_send() Bluetooth: Fix error code in chan_alloc_skb_cb() RDMA/hns: Fix wrong value of max_sge_rd RDMA/bnxt_re: Avoid clearing VLAN_ID mask in modify qp path xfrm_output: Force software GSO only in tunnel mode i2c: sis630: Fix an error handling path in sis630_probe() i2c: ali15x3: Fix an error handling path in ali15x3_probe() i2c: ali1535: Fix an error handling path in ali1535_probe() ASoC: codecs: wm0010: Fix error handling path in wm0010_spi_probe() drm/gma500: Add NULL check for pci_gfx_root in mid_get_vbt_data() qlcnic: fix memory leak issues in qlcnic_sriov_common.c drm/amd/display: Assign normalized_pix_clk when color depth = 14 x86/microcode/AMD: Fix out-of-bounds on systems with CPU-less NUMA nodes USB: serial: option: match on interface class for Telit FN990B USB: serial: option: fix Telit Cinterion FE990A name USB: serial: option: add Telit Cinterion FE990B compositions USB: serial: ftdi_sio: add support for Altera USB Blaster 3 block: fix 'kmem_cache of name 'bio-108' already exists' drm/nouveau: Do not override forced connector status x86/irq: Define trace events conditionally nvme: only allow entering LIVE from CONNECTING state sctp: Fix undefined behavior in left shift operation nvmet-rdma: recheck queue state is LIVE in state lock in recv done s390/cio: Fix CHPID "configure" attribute caching HID: ignore non-functional sensor in HP 5MP Camera iscsi_ibft: Fix UBSAN shift-out-of-bounds warning in ibft_attr_show_nic() powercap: call put_device() on an error path in powercap_register_control_type() nvme-fc: go straight to connecting state when initializing net_sched: Prevent creation of classes with TC_H_ROOT ipvs: prevent integer overflow in do_ip_vs_get_ctl() netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree() Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio() drivers/hv: Replace binary semaphore with mutex netpoll: hold rcu read lock in __netpoll_send_skb() netpoll: netpoll_send_skb() returns transmit status netpoll: move netpoll_send_skb() out of line netpoll: remove dev argument from netpoll_send_skb_on_dev() netpoll: Fix use correct return type for ndo_start_xmit() pinctrl: bcm281xx: Fix incorrect regmap max_registers value sctp: sysctl: auth_enable: avoid using current->nsproxy sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy Revert "sctp: sysctl: auth_enable: avoid using current->nsproxy" Revert "sctp: sysctl: cookie_hmac_alg: avoid using current->nsproxy" sched/isolation: Prevent boot crash when the boot CPU is nohz_full CIP: Bump version suffix to -cip119 after merge from cip/linux-4.19.y-st tree watchdog: renesas_wdt: support handover from bootloader Update localversion-st, tree is up-to-date with 5.4.291. gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl(). gtp: Destroy device along with udp socket's netns dismantle. net: gso: fix ownership in __udp_gso_segment vlan: fix memory leak in vlan_newlink() batman-adv: Drop unmanaged ELP metric worker tee: optee: Fix supplicant wait loop pps: Fix a use-after-free net: rose: lock the socket in rose_bind() btrfs: fix use-after-free when attempting to join an aborted transaction media: lmedm04: Handle errors for lme2510_int_read wifi: rtlwifi: rtl8192se: rise completion of firmware loading as last step eeprom: digsy_mtc: Make GPIO lookup table match the device slimbus: messaging: Free transaction ID in delayed interrupt scenario intel_th: pci: Add Panther Lake-P/U support intel_th: pci: Add Panther Lake-H support intel_th: pci: Add Arrow Lake support Squashfs: check the inode number is not the invalid value of zero xhci: pci: Fix indentation in the PCI device ID definitions usb: gadget: Check bmAttributes only if configuration is valid usb: gadget: Fix setting self-powered state on suspend usb: gadget: Set self-powered based on MaxPower and bmAttributes usb: typec: tcpci_rt1711h: Unmask alert interrupts to fix functionality usb: typec: ucsi: increase timeout for PPM reset operations usb: atm: cxacru: fix a flaw in existing endpoint checks usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader usb: renesas_usbhs: Use devm_usb_get_phy() Revert "drivers/card_reader/rtsx_usb: Restore interrupt based detection" net: ipv6: fix missing dst ref drop in ila lwtunnel net: ipv6: fix dst ref loop in ila lwtunnel net-timestamp: support TCP GSO case for a few missing flags vlan: enforce underlying device type ppp: Fix KMSAN uninit-value warning with bpf be2net: fix sleeping while atomic bugs in be_ndo_bridge_getlink hwmon: fix a NULL vs IS_ERR_OR_NULL() check in xgene_hwmon_probe() llc: do not use skb_get() before dev_queue_xmit() hwmon: (ad7314) Validate leading zero bits and return error hwmon: (ntc_thermistor) Fix the ncpXXxh103 sensor table hwmon: (pmbus) Initialise page count in pmbus_identify() caif_virtio: fix wrong pointer check in cfv_probe() HID: intel-ish-hid: Fix use-after-free issue in ishtp_hid_remove() mm/page_alloc: fix uninitialized variable rapidio: fix an API misues when rio_add_net() fails rapidio: add check for rio_add_net() in rio_scan_alloc_net() wifi: nl80211: reject cooked mode if it is set along with other flags wifi: cfg80211: regulatory: improve invalid hints checking x86/cpu: Properly parse CPUID leaf 0x2 TLB descriptor 0x63 x86/cpu: Validate CPUID leaf 0x2 EDX output x86/cacheinfo: Validate CPUID leaf 0x2 EDX output platform/x86: thinkpad_acpi: Add battery quirk for ThinkPad X131e drm/radeon: Fix rs400_gpu_init for ATI mobility radeon Xpress 200M ALSA: hda/realtek: update ALC222 depop optimize ALSA: hda: intel: Add Dell ALC3271 to power_save denylist HID: appleir: Fix potential NULL dereference at raw event handle Revert "of: reserved-memory: Fix using wrong number of cells to get property 'alignment'" drm/amdgpu: disable BAR resize on Dell G5 SE drm/amdgpu: Check extended configuration space register when system uses large bar drm/amdgpu: skip BAR resizing if the bios already did it acct: perform last write from workqueue kernel/acct.c: use dedicated helper to access rlimit values kernel/acct.c: use #elif instead of #end and #elif pfifo_tail_enqueue: Drop new packet when sch->limit == 0 sched/core: Prevent rescheduling when interrupts are disabled phy: exynos5-usbdrd: fix MPLL_MULTIPLIER and SSC_REFCLKSEL masks in refclk usbnet: gl620a: fix endpoint checking in genelink_bind() perf/core: Fix low freq setting via IOC_PERIOD ftrace: Avoid potential division by zero in function_stat_show() x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems ipvs: Always clear ipvs_property flag in skb_scrub_packet() ASoC: es8328: fix route from DAC to output net: cadence: macb: Synchronize stats calculations sunrpc: suppress warnings for unused procfs functions batman-adv: Ignore neighbor throughput metrics in error case acct: block access to kernel internal filesystems ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED nfp: bpf: Add check for nfp_app_ctrl_msg_alloc() power: supply: da9150-fg: fix potential overflow geneve: Suppress list corruption splat in geneve_destroy_tunnels(). geneve: Fix use-after-free in geneve_find_dev(). powerpc/code-patching: Fix KASAN hit by not flagging text patching area as VM_ALLOC ALSA: hda/realtek - Add type for ALC287 powerpc/64s: Rewrite __real_pte() and __rpte_to_hidx() as static inline powerpc/64s/mm: Move __real_pte stubs into hash-4k.h USB: gadget: f_midi: f_midi_complete to call queue_work usb/gadget: f_midi: Replace tasklet with work usb/gadget: f_midi: convert tasklets to use new tasklet_setup() API usb: dwc3: Fix timeout issue during controller enter/exit from halt state mm: update mark_victim tracepoints fields crypto: testmgr - some more fixes to RSA test vectors crypto: testmgr - populate RSA CRT parameters in RSA test vectors crypto: testmgr - fix version number of RSA tests crypto: testmgr - Fix wrong test case of RSA crypto: testmgr - fix wrong key length for pkcs1pad driver core: bus: Fix double free in driver API bus_register() scsi: storvsc: Set correct data length for sending SCSI command without payload vlan: move dev_put into vlan_dev_uninit vlan: introduce vlan_dev_free_egress_priority Revert "btrfs: avoid monopolizing a core when activating a swap file" parport_pc: add support for ASIX AX99100 can: ems_pci: move ASIX AX99100 ids to pci_ids.h nilfs2: protect access to buffers with no active references nilfs2: do not force clear folio if buffer is referenced nilfs2: do not output warnings when clearing dirty buffers alpha: replace hardcoded stack offsets with autogenerated ones ndisc: extend RCU protection in ndisc_send_skb() openvswitch: use RCU protection in ovs_vport_cmd_fill_info() arp: use RCU protection in arp_xmit() neighbour: use RCU protection in __neigh_notify() neighbour: delete redundant judgment statements ndisc: use RCU protection in ndisc_alloc_skb() ipv6: use RCU protection in ip6_default_advmss() ipv4: use RCU protection in inet_select_addr() ipv4: use RCU protection in rt_is_expired() net: add dev_net_rcu() helper net: treat possible_net_t net pointer as an RCU one and add read_pnet_rcu() partitions: mac: fix handling of bogus partition table gpio: stmpe: Check return value of stmpe_reg_read in stmpe_gpio_irq_sync_unlock alpha: align stack for page fault and user unaligned trap handlers alpha: make stack 16-byte aligned (most cases) can: c_can: fix unbalanced runtime PM disable in error path USB: serial: option: drop MeiG Smart defines USB: serial: option: fix Telit Cinterion FN990A name USB: serial: option: add Telit Cinterion FN990B compositions USB: serial: option: add MeiG Smart SLM828 usb: cdc-acm: Fix handling of oversized fragments usb: cdc-acm: Check control transfer buffer size before access USB: cdc-acm: Fill in Renesas R-Car D3 USB Download mode quirk USB: hub: Ignore non-compliant devices with too many configs or interfaces usb: gadget: f_midi: fix MIDI Streaming descriptor lengths USB: Add USB_QUIRK_NO_LPM quirk for sony xperia xz1 smartphone USB: quirks: add USB_QUIRK_NO_LPM quirk for Teclast dist USB: pci-quirks: Fix HCCPARAMS register error for LS7A EHCI usb: dwc2: gadget: remove of_node reference upon udc_stop usb: gadget: udc: renesas_usb3: Fix compiler warning usb: roles: set switch registered flag early on batman-adv: fix panic during interface removal ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet 5V orangefs: fix a oob in orangefs_debug_write Grab mm lock before grabbing pt lock vfio/pci: Enable iowrite64 and ioread64 for vfio pci media: cxd2841er: fix 64-bit division on gcc-9 xen: remove a confusing comment on auto-translated guest I/O gpio: bcm-kona: Add missing newline to dev_err format string gpio: bcm-kona: Fix GPIO lock/unlock for banks above bank 0 arm64: cacheinfo: Avoid out-of-bounds write to cacheinfo array team: better TEAM_OPTION_TYPE_STRING validation vrf: use RCU protection in l3mdev_l3_out() ndisc: ndisc_send_redirect() must use dev_get_by_index_rcu() HID: multitouch: Add NULL check in mt_input_configured ocfs2: check dir i_size in ocfs2_find_entry MIPS: ftrace: Declare ftrace_get_parent_ra_addr() as static ptp: Ensure info->enable callback is always set mtd: onenand: Fix uninitialized retlen in do_otp_read() NFC: nci: Add bounds checking in nci_hci_create_pipe() nilfs2: fix possible int overflows in nilfs_fiemap() ocfs2: handle a symlink read error correctly ocfs2: fix incorrect CPU endianness conversion causing mount failure nvmem: core: improve range check for nvmem_cell_write() crypto: qce - fix goto jump in error path media: uvcvideo: Remove redundant NULL assignment media: uvcvideo: Fix event flags in uvc_ctrl_send_events media: ov5640: fix get_light_freq on auto soc: qcom: smem_state: fix missing of_node_put in error path powerpc/pseries/eeh: Fix get PE state translation serial: sh-sci: Do not probe the serial port if its slot in sci_ports[] is in use serial: sh-sci: Drop __initdata macro for port_cfg usb: gadget: f_tcm: Don't prepare BOT write request twice usb: gadget: f_tcm: ep_autoconfig with fullspeed endpoint usb: gadget: f_tcm: Decrement command ref count on cleanup usb: gadget: f_tcm: Translate error to sense wifi: brcmfmac: fix NULL pointer dereference in brcmf_txfinalize() HID: hid-sensor-hub: don't use stale platform-data on remove of: reserved-memory: Fix using wrong number of cells to get property 'alignment' of: Fix of_find_node_opts_by_path() handling of alias+path+options of: Correct child specifier used as input of the 2nd nexus node clk: qcom: clk-alpha-pll: fix alpha mode configuration Bluetooth: L2CAP: handle NULL sock pointer in l2cap_sock_alloc KVM: s390: vsie: fix some corner-cases when grabbing vsie pages KVM: Explicitly verify target vCPU is online in kvm_get_vcpu() arm64: dts: rockchip: increase gmac rx_delay on rk3399-puma binfmt_flat: Fix integer overflow bug on 32 bit systems m68k: vga: Fix I/O defines s390/futex: Fix FUTEX_OP_ANDN implementation leds: lp8860: Write full EEPROM, not only half of it cpufreq: s3c64xx: Fix compilation warning tun: revert fix group permission check netem: Update sch->q.qlen before qdisc_tree_reduce_backlog() udp: gso: do not drop small packets when PMTU reduces tg3: Disable tg3 PCIe AER on system reboot firmware: iscsi_ibft: fix ISCSI_IBFT Kconfig entry nvme: handle connectivity loss in nvme_set_queue_count usb: xhci: Fix NULL pointer dereference on certain command aborts usb: xhci: Add timeout argument in address_device USB HCD callback media: uvcvideo: Remove dangling pointers media: uvcvideo: Only save async fh if success nilfs2: handle errors that nilfs_prepare_chunk() may return nilfs2: eliminate staggered calls to kunmap in nilfs_rename nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link x86/mm: Don't disable PCID when INVLPG has been fixed by microcode HID: Wacom: Add PCI Wacom device support mfd: lpc_ich: Add another Gemini Lake ISA bridge PCI device-id wifi: brcmsmac: add gain range check to wlc_phy_iqcal_gainparams_nphy() mmc: core: Respect quirk_max_rate for non-UHS SDIO card tun: fix group permission check printk: Fix signed integer overflow when defining LOG_BUF_LEN_MAX sched: Don't try to catch up excess steal time. btrfs: convert BUG_ON in btrfs_reloc_cow_block() to proper error handling btrfs: output the reason for open_ctree() failure usb: gadget: f_tcm: Don't free command immediately media: uvcvideo: Fix double free in error path usb: typec: tcpm: set SRC_SEND_CAPABILITIES timeout to PD_T_SENDER_RESPONSE drivers/card_reader/rtsx_usb: Restore interrupt based detection ktest.pl: Check kernelrelease return in get_version NFSD: Reset cb_seq_status after NFS4ERR_DELAY hexagon: Fix unbalanced spinlock in die() hexagon: fix using plain integer as NULL pointer warning in cmpxchg genksyms: fix memory leak when the same symbol is read from *.symref file genksyms: fix memory leak when the same symbol is added from source net: sh_eth: Fix missing rtnl lock in suspend/resume path vsock: Allow retrying on connect() failure net: davicom: fix UAF in dm9000_drv_remove net: rose: fix timer races against user threads PM: hibernate: Add error handling for syscore_suspend() net: fec: implement TSO descriptor cleanup ubifs: skip dumping tnc tree when zroot is null dmaengine: ti: edma: fix OF node reference leaks in edma_driver module: Extend the preempt disabled section in dereference_symbol_descriptor(). ocfs2: mark dquot as inactive if failed to start trans while releasing dquot scsi: mpt3sas: Set ioc->manu_pg11.EEDPTagMode directly to 1 media: camif-core: Add check for clk_enable() media: mipi-csis: Add check for clk_enable() PCI: endpoint: Destroy the EPC device in devm_pci_epc_destroy() media: rc: iguanair: handle timeouts fbdev: omapfb: Fix an OF node leak in dss_of_port_get_parent_device() ARM: dts: mediatek: mt7623: fix IR nodename arm64: dts: mediatek: mt8173-evb: Fix MT6397 PMIC sub-node names arm64: dts: mediatek: mt8173-evb: Drop regulator-compatible property rdma/cxgb4: Prevent potential integer overflow on 32bit RDMA/mlx4: Avoid false error about access to uninitialized gids array perf report: Fix misleading help message about --demangle perf top: Don't complain about lack of vmlinux when not resolving some kernel samples padata: fix sysfs store callback check ktest.pl: Remove unused declarations in run_bisect_test function net: sched: Disallow replacing of child qdisc from one parent to another net/mlxfw: Drop hard coded max FW flash image size selftests: harness: fix printing of mismatch values in __EXPECT() selftests/harness: Display signed values correctly wifi: wlcore: fix unbalanced pm_runtime calls regulator: of: Implement the unwind path of of_regulator_match() team: prevent adding a device which is already a team device lower cpupower: fix TSC MHz calculation wifi: rtlwifi: pci: wait for firmware loading before releasing memory wifi: rtlwifi: fix memory leaks and invalid access at probe error path wifi: rtlwifi: remove unused dualmac control leftovers rtlwifi: replace usage of found with dedicated list iterator variable wifi: rtlwifi: usb: fix workqueue leak when probe fails wifi: rtlwifi: do not complete firmware loading needlessly drm/amdgpu: Fix potential NULL pointer dereference in atomctrl_get_smc_sclk_range_table drm/etnaviv: Fix page property being used for non writecombine buffers afs: Fix directory format encoding struct overflow: Allow mixed type arguments overflow: Correct check_shl_overflow() comment overflow: Add __must_check attribute to check_*() helpers udf: Fix use of check_add_overflow() with mixed type arguments Change-Id: Ia7c26633509cfe8ec59d7dd0d6efd602629c87f4 Signed-off-by: bengris32 <bengris32@protonmail.ch>
914 lines
22 KiB
C
914 lines
22 KiB
C
/*
|
|
* Performance events ring-buffer code:
|
|
*
|
|
* Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
|
|
* Copyright (C) 2008-2011 Red Hat, Inc., Ingo Molnar
|
|
* Copyright (C) 2008-2011 Red Hat, Inc., Peter Zijlstra
|
|
* Copyright © 2009 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com>
|
|
*
|
|
* For licensing details see kernel-base/COPYING
|
|
*/
|
|
|
|
#include <linux/perf_event.h>
|
|
#include <linux/vmalloc.h>
|
|
#include <linux/slab.h>
|
|
#include <linux/circ_buf.h>
|
|
#include <linux/poll.h>
|
|
#include <linux/nospec.h>
|
|
|
|
#include "internal.h"
|
|
|
|
static void perf_output_wakeup(struct perf_output_handle *handle)
|
|
{
|
|
atomic_set(&handle->rb->poll, EPOLLIN | EPOLLRDNORM);
|
|
|
|
handle->event->pending_wakeup = 1;
|
|
irq_work_queue(&handle->event->pending);
|
|
}
|
|
|
|
/*
|
|
* We need to ensure a later event_id doesn't publish a head when a former
|
|
* event isn't done writing. However since we need to deal with NMIs we
|
|
* cannot fully serialize things.
|
|
*
|
|
* We only publish the head (and generate a wakeup) when the outer-most
|
|
* event completes.
|
|
*/
|
|
static void perf_output_get_handle(struct perf_output_handle *handle)
|
|
{
|
|
struct ring_buffer *rb = handle->rb;
|
|
|
|
preempt_disable();
|
|
local_inc(&rb->nest);
|
|
handle->wakeup = local_read(&rb->wakeup);
|
|
}
|
|
|
|
static void perf_output_put_handle(struct perf_output_handle *handle)
|
|
{
|
|
struct ring_buffer *rb = handle->rb;
|
|
unsigned long head;
|
|
|
|
again:
|
|
/*
|
|
* In order to avoid publishing a head value that goes backwards,
|
|
* we must ensure the load of @rb->head happens after we've
|
|
* incremented @rb->nest.
|
|
*
|
|
* Otherwise we can observe a @rb->head value before one published
|
|
* by an IRQ/NMI happening between the load and the increment.
|
|
*/
|
|
barrier();
|
|
head = local_read(&rb->head);
|
|
|
|
/*
|
|
* IRQ/NMI can happen here and advance @rb->head, causing our
|
|
* load above to be stale.
|
|
*/
|
|
|
|
/*
|
|
* If this isn't the outermost nesting, we don't have to update
|
|
* @rb->user_page->data_head.
|
|
*/
|
|
if (local_read(&rb->nest) > 1) {
|
|
local_dec(&rb->nest);
|
|
goto out;
|
|
}
|
|
|
|
/*
|
|
* Since the mmap() consumer (userspace) can run on a different CPU:
|
|
*
|
|
* kernel user
|
|
*
|
|
* if (LOAD ->data_tail) { LOAD ->data_head
|
|
* (A) smp_rmb() (C)
|
|
* STORE $data LOAD $data
|
|
* smp_wmb() (B) smp_mb() (D)
|
|
* STORE ->data_head STORE ->data_tail
|
|
* }
|
|
*
|
|
* Where A pairs with D, and B pairs with C.
|
|
*
|
|
* In our case (A) is a control dependency that separates the load of
|
|
* the ->data_tail and the stores of $data. In case ->data_tail
|
|
* indicates there is no room in the buffer to store $data we do not.
|
|
*
|
|
* D needs to be a full barrier since it separates the data READ
|
|
* from the tail WRITE.
|
|
*
|
|
* For B a WMB is sufficient since it separates two WRITEs, and for C
|
|
* an RMB is sufficient since it separates two READs.
|
|
*
|
|
* See perf_output_begin().
|
|
*/
|
|
smp_wmb(); /* B, matches C */
|
|
WRITE_ONCE(rb->user_page->data_head, head);
|
|
|
|
/*
|
|
* We must publish the head before decrementing the nest count,
|
|
* otherwise an IRQ/NMI can publish a more recent head value and our
|
|
* write will (temporarily) publish a stale value.
|
|
*/
|
|
barrier();
|
|
local_set(&rb->nest, 0);
|
|
|
|
/*
|
|
* Ensure we decrement @rb->nest before we validate the @rb->head.
|
|
* Otherwise we cannot be sure we caught the 'last' nested update.
|
|
*/
|
|
barrier();
|
|
if (unlikely(head != local_read(&rb->head))) {
|
|
local_inc(&rb->nest);
|
|
goto again;
|
|
}
|
|
|
|
if (handle->wakeup != local_read(&rb->wakeup))
|
|
perf_output_wakeup(handle);
|
|
|
|
out:
|
|
preempt_enable();
|
|
}
|
|
|
|
static __always_inline bool
|
|
ring_buffer_has_space(unsigned long head, unsigned long tail,
|
|
unsigned long data_size, unsigned int size,
|
|
bool backward)
|
|
{
|
|
if (!backward)
|
|
return CIRC_SPACE(head, tail, data_size) >= size;
|
|
else
|
|
return CIRC_SPACE(tail, head, data_size) >= size;
|
|
}
|
|
|
|
static __always_inline int
|
|
__perf_output_begin(struct perf_output_handle *handle,
|
|
struct perf_event *event, unsigned int size,
|
|
bool backward)
|
|
{
|
|
struct ring_buffer *rb;
|
|
unsigned long tail, offset, head;
|
|
int have_lost, page_shift;
|
|
struct {
|
|
struct perf_event_header header;
|
|
u64 id;
|
|
u64 lost;
|
|
} lost_event;
|
|
|
|
rcu_read_lock();
|
|
/*
|
|
* For inherited events we send all the output towards the parent.
|
|
*/
|
|
if (event->parent)
|
|
event = event->parent;
|
|
|
|
rb = rcu_dereference(event->rb);
|
|
if (unlikely(!rb))
|
|
goto out;
|
|
|
|
if (unlikely(rb->paused)) {
|
|
if (rb->nr_pages)
|
|
local_inc(&rb->lost);
|
|
goto out;
|
|
}
|
|
|
|
handle->rb = rb;
|
|
handle->event = event;
|
|
|
|
have_lost = local_read(&rb->lost);
|
|
if (unlikely(have_lost)) {
|
|
size += sizeof(lost_event);
|
|
if (event->attr.sample_id_all)
|
|
size += event->id_header_size;
|
|
}
|
|
|
|
perf_output_get_handle(handle);
|
|
|
|
do {
|
|
tail = READ_ONCE(rb->user_page->data_tail);
|
|
offset = head = local_read(&rb->head);
|
|
if (!rb->overwrite) {
|
|
if (unlikely(!ring_buffer_has_space(head, tail,
|
|
perf_data_size(rb),
|
|
size, backward)))
|
|
goto fail;
|
|
}
|
|
|
|
/*
|
|
* The above forms a control dependency barrier separating the
|
|
* @tail load above from the data stores below. Since the @tail
|
|
* load is required to compute the branch to fail below.
|
|
*
|
|
* A, matches D; the full memory barrier userspace SHOULD issue
|
|
* after reading the data and before storing the new tail
|
|
* position.
|
|
*
|
|
* See perf_output_put_handle().
|
|
*/
|
|
|
|
if (!backward)
|
|
head += size;
|
|
else
|
|
head -= size;
|
|
} while (local_cmpxchg(&rb->head, offset, head) != offset);
|
|
|
|
if (backward) {
|
|
offset = head;
|
|
head = (u64)(-head);
|
|
}
|
|
|
|
/*
|
|
* We rely on the implied barrier() by local_cmpxchg() to ensure
|
|
* none of the data stores below can be lifted up by the compiler.
|
|
*/
|
|
|
|
if (unlikely(head - local_read(&rb->wakeup) > rb->watermark))
|
|
local_add(rb->watermark, &rb->wakeup);
|
|
|
|
page_shift = PAGE_SHIFT + page_order(rb);
|
|
|
|
handle->page = (offset >> page_shift) & (rb->nr_pages - 1);
|
|
offset &= (1UL << page_shift) - 1;
|
|
handle->addr = rb->data_pages[handle->page] + offset;
|
|
handle->size = (1UL << page_shift) - offset;
|
|
|
|
if (unlikely(have_lost)) {
|
|
struct perf_sample_data sample_data;
|
|
|
|
lost_event.header.size = sizeof(lost_event);
|
|
lost_event.header.type = PERF_RECORD_LOST;
|
|
lost_event.header.misc = 0;
|
|
lost_event.id = event->id;
|
|
lost_event.lost = local_xchg(&rb->lost, 0);
|
|
|
|
perf_event_header__init_id(&lost_event.header,
|
|
&sample_data, event);
|
|
perf_output_put(handle, lost_event);
|
|
perf_event__output_id_sample(event, handle, &sample_data);
|
|
}
|
|
|
|
return 0;
|
|
|
|
fail:
|
|
local_inc(&rb->lost);
|
|
perf_output_put_handle(handle);
|
|
out:
|
|
rcu_read_unlock();
|
|
|
|
return -ENOSPC;
|
|
}
|
|
|
|
int perf_output_begin_forward(struct perf_output_handle *handle,
|
|
struct perf_event *event, unsigned int size)
|
|
{
|
|
return __perf_output_begin(handle, event, size, false);
|
|
}
|
|
|
|
int perf_output_begin_backward(struct perf_output_handle *handle,
|
|
struct perf_event *event, unsigned int size)
|
|
{
|
|
return __perf_output_begin(handle, event, size, true);
|
|
}
|
|
|
|
int perf_output_begin(struct perf_output_handle *handle,
|
|
struct perf_event *event, unsigned int size)
|
|
{
|
|
|
|
return __perf_output_begin(handle, event, size,
|
|
unlikely(is_write_backward(event)));
|
|
}
|
|
|
|
unsigned int perf_output_copy(struct perf_output_handle *handle,
|
|
const void *buf, unsigned int len)
|
|
{
|
|
return __output_copy(handle, buf, len);
|
|
}
|
|
|
|
unsigned int perf_output_skip(struct perf_output_handle *handle,
|
|
unsigned int len)
|
|
{
|
|
return __output_skip(handle, NULL, len);
|
|
}
|
|
|
|
void perf_output_end(struct perf_output_handle *handle)
|
|
{
|
|
perf_output_put_handle(handle);
|
|
rcu_read_unlock();
|
|
}
|
|
|
|
static void
|
|
ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
|
|
{
|
|
long max_size = perf_data_size(rb);
|
|
|
|
if (watermark)
|
|
rb->watermark = min(max_size, watermark);
|
|
|
|
if (!rb->watermark)
|
|
rb->watermark = max_size / 2;
|
|
|
|
if (flags & RING_BUFFER_WRITABLE)
|
|
rb->overwrite = 0;
|
|
else
|
|
rb->overwrite = 1;
|
|
|
|
atomic_set(&rb->refcount, 1);
|
|
|
|
INIT_LIST_HEAD(&rb->event_list);
|
|
spin_lock_init(&rb->event_lock);
|
|
|
|
/*
|
|
* perf_output_begin() only checks rb->paused, therefore
|
|
* rb->paused must be true if we have no pages for output.
|
|
*/
|
|
if (!rb->nr_pages)
|
|
rb->paused = 1;
|
|
}
|
|
|
|
void perf_aux_output_flag(struct perf_output_handle *handle, u64 flags)
|
|
{
|
|
/*
|
|
* OVERWRITE is determined by perf_aux_output_end() and can't
|
|
* be passed in directly.
|
|
*/
|
|
if (WARN_ON_ONCE(flags & PERF_AUX_FLAG_OVERWRITE))
|
|
return;
|
|
|
|
handle->aux_flags |= flags;
|
|
}
|
|
EXPORT_SYMBOL_GPL(perf_aux_output_flag);
|
|
|
|
/*
|
|
* This is called before hardware starts writing to the AUX area to
|
|
* obtain an output handle and make sure there's room in the buffer.
|
|
* When the capture completes, call perf_aux_output_end() to commit
|
|
* the recorded data to the buffer.
|
|
*
|
|
* The ordering is similar to that of perf_output_{begin,end}, with
|
|
* the exception of (B), which should be taken care of by the pmu
|
|
* driver, since ordering rules will differ depending on hardware.
|
|
*
|
|
* Call this from pmu::start(); see the comment in perf_aux_output_end()
|
|
* about its use in pmu callbacks. Both can also be called from the PMI
|
|
* handler if needed.
|
|
*/
|
|
void *perf_aux_output_begin(struct perf_output_handle *handle,
|
|
struct perf_event *event)
|
|
{
|
|
struct perf_event *output_event = event;
|
|
unsigned long aux_head, aux_tail;
|
|
struct ring_buffer *rb;
|
|
|
|
if (output_event->parent)
|
|
output_event = output_event->parent;
|
|
|
|
/*
|
|
* Since this will typically be open across pmu::add/pmu::del, we
|
|
* grab ring_buffer's refcount instead of holding rcu read lock
|
|
* to make sure it doesn't disappear under us.
|
|
*/
|
|
rb = ring_buffer_get(output_event);
|
|
if (!rb)
|
|
return NULL;
|
|
|
|
if (!rb_has_aux(rb))
|
|
goto err;
|
|
|
|
/*
|
|
* If aux_mmap_count is zero, the aux buffer is in perf_mmap_close(),
|
|
* about to get freed, so we leave immediately.
|
|
*
|
|
* Checking rb::aux_mmap_count and rb::refcount has to be done in
|
|
* the same order, see perf_mmap_close. Otherwise we end up freeing
|
|
* aux pages in this path, which is a bug, because in_atomic().
|
|
*/
|
|
if (!atomic_read(&rb->aux_mmap_count))
|
|
goto err;
|
|
|
|
if (!atomic_inc_not_zero(&rb->aux_refcount))
|
|
goto err;
|
|
|
|
/*
|
|
* Nesting is not supported for AUX area, make sure nested
|
|
* writers are caught early
|
|
*/
|
|
if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
|
|
goto err_put;
|
|
|
|
aux_head = rb->aux_head;
|
|
|
|
handle->rb = rb;
|
|
handle->event = event;
|
|
handle->head = aux_head;
|
|
handle->size = 0;
|
|
handle->aux_flags = 0;
|
|
|
|
/*
|
|
* In overwrite mode, AUX data stores do not depend on aux_tail,
|
|
* therefore (A) control dependency barrier does not exist. The
|
|
* (B) <-> (C) ordering is still observed by the pmu driver.
|
|
*/
|
|
if (!rb->aux_overwrite) {
|
|
aux_tail = READ_ONCE(rb->user_page->aux_tail);
|
|
handle->wakeup = rb->aux_wakeup + rb->aux_watermark;
|
|
if (aux_head - aux_tail < perf_aux_size(rb))
|
|
handle->size = CIRC_SPACE(aux_head, aux_tail, perf_aux_size(rb));
|
|
|
|
/*
|
|
* handle->size computation depends on aux_tail load; this forms a
|
|
* control dependency barrier separating aux_tail load from aux data
|
|
* store that will be enabled on successful return
|
|
*/
|
|
if (!handle->size) { /* A, matches D */
|
|
event->pending_disable = smp_processor_id();
|
|
perf_output_wakeup(handle);
|
|
local_set(&rb->aux_nest, 0);
|
|
goto err_put;
|
|
}
|
|
}
|
|
|
|
return handle->rb->aux_priv;
|
|
|
|
err_put:
|
|
/* can't be last */
|
|
rb_free_aux(rb);
|
|
|
|
err:
|
|
ring_buffer_put(rb);
|
|
handle->event = NULL;
|
|
|
|
return NULL;
|
|
}
|
|
EXPORT_SYMBOL_GPL(perf_aux_output_begin);
|
|
|
|
static __always_inline bool rb_need_aux_wakeup(struct ring_buffer *rb)
|
|
{
|
|
if (rb->aux_overwrite)
|
|
return false;
|
|
|
|
if (rb->aux_head - rb->aux_wakeup >= rb->aux_watermark) {
|
|
rb->aux_wakeup = rounddown(rb->aux_head, rb->aux_watermark);
|
|
return true;
|
|
}
|
|
|
|
return false;
|
|
}
|
|
|
|
/*
|
|
* Commit the data written by hardware into the ring buffer by adjusting
|
|
* aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
|
|
* pmu driver's responsibility to observe ordering rules of the hardware,
|
|
* so that all the data is externally visible before this is called.
|
|
*
|
|
* Note: this has to be called from pmu::stop() callback, as the assumption
|
|
* of the AUX buffer management code is that after pmu::stop(), the AUX
|
|
* transaction must be stopped and therefore drop the AUX reference count.
|
|
*/
|
|
void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
|
|
{
|
|
bool wakeup = !!(handle->aux_flags & PERF_AUX_FLAG_TRUNCATED);
|
|
struct ring_buffer *rb = handle->rb;
|
|
unsigned long aux_head;
|
|
|
|
/* in overwrite mode, driver provides aux_head via handle */
|
|
if (rb->aux_overwrite) {
|
|
handle->aux_flags |= PERF_AUX_FLAG_OVERWRITE;
|
|
|
|
aux_head = handle->head;
|
|
rb->aux_head = aux_head;
|
|
} else {
|
|
handle->aux_flags &= ~PERF_AUX_FLAG_OVERWRITE;
|
|
|
|
aux_head = rb->aux_head;
|
|
rb->aux_head += size;
|
|
}
|
|
|
|
if (size || handle->aux_flags) {
|
|
/*
|
|
* Only send RECORD_AUX if we have something useful to communicate
|
|
*/
|
|
|
|
perf_event_aux_event(handle->event, aux_head, size,
|
|
handle->aux_flags);
|
|
}
|
|
|
|
WRITE_ONCE(rb->user_page->aux_head, rb->aux_head);
|
|
if (rb_need_aux_wakeup(rb))
|
|
wakeup = true;
|
|
|
|
if (wakeup) {
|
|
if (handle->aux_flags & PERF_AUX_FLAG_TRUNCATED)
|
|
handle->event->pending_disable = smp_processor_id();
|
|
perf_output_wakeup(handle);
|
|
}
|
|
|
|
handle->event = NULL;
|
|
|
|
local_set(&rb->aux_nest, 0);
|
|
/* can't be last */
|
|
rb_free_aux(rb);
|
|
ring_buffer_put(rb);
|
|
}
|
|
EXPORT_SYMBOL_GPL(perf_aux_output_end);
|
|
|
|
/*
|
|
* Skip over a given number of bytes in the AUX buffer, due to, for example,
|
|
* hardware's alignment constraints.
|
|
*/
|
|
int perf_aux_output_skip(struct perf_output_handle *handle, unsigned long size)
|
|
{
|
|
struct ring_buffer *rb = handle->rb;
|
|
|
|
if (size > handle->size)
|
|
return -ENOSPC;
|
|
|
|
rb->aux_head += size;
|
|
|
|
WRITE_ONCE(rb->user_page->aux_head, rb->aux_head);
|
|
if (rb_need_aux_wakeup(rb)) {
|
|
perf_output_wakeup(handle);
|
|
handle->wakeup = rb->aux_wakeup + rb->aux_watermark;
|
|
}
|
|
|
|
handle->head = rb->aux_head;
|
|
handle->size -= size;
|
|
|
|
return 0;
|
|
}
|
|
EXPORT_SYMBOL_GPL(perf_aux_output_skip);
|
|
|
|
void *perf_get_aux(struct perf_output_handle *handle)
|
|
{
|
|
/* this is only valid between perf_aux_output_begin and *_end */
|
|
if (!handle->event)
|
|
return NULL;
|
|
|
|
return handle->rb->aux_priv;
|
|
}
|
|
EXPORT_SYMBOL_GPL(perf_get_aux);
|
|
|
|
#define PERF_AUX_GFP (GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY)
|
|
|
|
static struct page *rb_alloc_aux_page(int node, int order)
|
|
{
|
|
struct page *page;
|
|
|
|
if (order > MAX_ORDER)
|
|
order = MAX_ORDER;
|
|
|
|
do {
|
|
page = alloc_pages_node(node, PERF_AUX_GFP, order);
|
|
} while (!page && order--);
|
|
|
|
if (page && order) {
|
|
/*
|
|
* Communicate the allocation size to the driver:
|
|
* if we managed to secure a high-order allocation,
|
|
* set its first page's private to this order;
|
|
* !PagePrivate(page) means it's just a normal page.
|
|
*/
|
|
split_page(page, order);
|
|
SetPagePrivate(page);
|
|
set_page_private(page, order);
|
|
}
|
|
|
|
return page;
|
|
}
|
|
|
|
static void rb_free_aux_page(struct ring_buffer *rb, int idx)
|
|
{
|
|
struct page *page = virt_to_page(rb->aux_pages[idx]);
|
|
|
|
ClearPagePrivate(page);
|
|
page->mapping = NULL;
|
|
__free_page(page);
|
|
}
|
|
|
|
static void __rb_free_aux(struct ring_buffer *rb)
|
|
{
|
|
int pg;
|
|
|
|
/*
|
|
* Should never happen, the last reference should be dropped from
|
|
* perf_mmap_close() path, which first stops aux transactions (which
|
|
* in turn are the atomic holders of aux_refcount) and then does the
|
|
* last rb_free_aux().
|
|
*/
|
|
WARN_ON_ONCE(in_atomic());
|
|
|
|
if (rb->aux_priv) {
|
|
rb->free_aux(rb->aux_priv);
|
|
rb->free_aux = NULL;
|
|
rb->aux_priv = NULL;
|
|
}
|
|
|
|
if (rb->aux_nr_pages) {
|
|
for (pg = 0; pg < rb->aux_nr_pages; pg++)
|
|
rb_free_aux_page(rb, pg);
|
|
|
|
kfree(rb->aux_pages);
|
|
rb->aux_nr_pages = 0;
|
|
}
|
|
}
|
|
|
|
int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
|
|
pgoff_t pgoff, int nr_pages, long watermark, int flags)
|
|
{
|
|
bool overwrite = !(flags & RING_BUFFER_WRITABLE);
|
|
int node = (event->cpu == -1) ? -1 : cpu_to_node(event->cpu);
|
|
int ret = -ENOMEM, max_order = 0;
|
|
|
|
if (!has_aux(event))
|
|
return -EOPNOTSUPP;
|
|
|
|
if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) {
|
|
/*
|
|
* We need to start with the max_order that fits in nr_pages,
|
|
* not the other way around, hence ilog2() and not get_order.
|
|
*/
|
|
max_order = ilog2(nr_pages);
|
|
|
|
/*
|
|
* PMU requests more than one contiguous chunks of memory
|
|
* for SW double buffering
|
|
*/
|
|
if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_SW_DOUBLEBUF) &&
|
|
!overwrite) {
|
|
if (!max_order)
|
|
return -EINVAL;
|
|
|
|
max_order--;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* kcalloc_node() is unable to allocate buffer if the size is larger
|
|
* than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
|
|
*/
|
|
if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
|
|
return -ENOMEM;
|
|
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
|
|
node);
|
|
if (!rb->aux_pages)
|
|
return -ENOMEM;
|
|
|
|
rb->free_aux = event->pmu->free_aux;
|
|
for (rb->aux_nr_pages = 0; rb->aux_nr_pages < nr_pages;) {
|
|
struct page *page;
|
|
int last, order;
|
|
|
|
order = min(max_order, ilog2(nr_pages - rb->aux_nr_pages));
|
|
page = rb_alloc_aux_page(node, order);
|
|
if (!page)
|
|
goto out;
|
|
|
|
for (last = rb->aux_nr_pages + (1 << page_private(page));
|
|
last > rb->aux_nr_pages; rb->aux_nr_pages++)
|
|
rb->aux_pages[rb->aux_nr_pages] = page_address(page++);
|
|
}
|
|
|
|
/*
|
|
* In overwrite mode, PMUs that don't support SG may not handle more
|
|
* than one contiguous allocation, since they rely on PMI to do double
|
|
* buffering. In this case, the entire buffer has to be one contiguous
|
|
* chunk.
|
|
*/
|
|
if ((event->pmu->capabilities & PERF_PMU_CAP_AUX_NO_SG) &&
|
|
overwrite) {
|
|
struct page *page = virt_to_page(rb->aux_pages[0]);
|
|
|
|
if (page_private(page) != max_order)
|
|
goto out;
|
|
}
|
|
|
|
rb->aux_priv = event->pmu->setup_aux(event, rb->aux_pages, nr_pages,
|
|
overwrite);
|
|
if (!rb->aux_priv)
|
|
goto out;
|
|
|
|
ret = 0;
|
|
|
|
/*
|
|
* aux_pages (and pmu driver's private data, aux_priv) will be
|
|
* referenced in both producer's and consumer's contexts, thus
|
|
* we keep a refcount here to make sure either of the two can
|
|
* reference them safely.
|
|
*/
|
|
atomic_set(&rb->aux_refcount, 1);
|
|
|
|
rb->aux_overwrite = overwrite;
|
|
rb->aux_watermark = watermark;
|
|
|
|
if (!rb->aux_watermark && !rb->aux_overwrite)
|
|
rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
|
|
|
|
out:
|
|
if (!ret)
|
|
rb->aux_pgoff = pgoff;
|
|
else
|
|
__rb_free_aux(rb);
|
|
|
|
return ret;
|
|
}
|
|
|
|
void rb_free_aux(struct ring_buffer *rb)
|
|
{
|
|
if (atomic_dec_and_test(&rb->aux_refcount))
|
|
__rb_free_aux(rb);
|
|
}
|
|
|
|
#ifndef CONFIG_PERF_USE_VMALLOC
|
|
|
|
/*
|
|
* Back perf_mmap() with regular GFP_KERNEL-0 pages.
|
|
*/
|
|
|
|
static struct page *
|
|
__perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
|
|
{
|
|
if (pgoff > rb->nr_pages)
|
|
return NULL;
|
|
|
|
if (pgoff == 0)
|
|
return virt_to_page(rb->user_page);
|
|
|
|
return virt_to_page(rb->data_pages[pgoff - 1]);
|
|
}
|
|
|
|
static void *perf_mmap_alloc_page(int cpu)
|
|
{
|
|
struct page *page;
|
|
int node;
|
|
|
|
node = (cpu == -1) ? cpu : cpu_to_node(cpu);
|
|
page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
|
|
if (!page)
|
|
return NULL;
|
|
|
|
return page_address(page);
|
|
}
|
|
|
|
struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
|
|
{
|
|
struct ring_buffer *rb;
|
|
unsigned long size;
|
|
int i;
|
|
|
|
size = sizeof(struct ring_buffer);
|
|
size += nr_pages * sizeof(void *);
|
|
|
|
if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
|
|
goto fail;
|
|
|
|
rb = kzalloc(size, GFP_KERNEL);
|
|
if (!rb)
|
|
goto fail;
|
|
|
|
rb->user_page = perf_mmap_alloc_page(cpu);
|
|
if (!rb->user_page)
|
|
goto fail_user_page;
|
|
|
|
for (i = 0; i < nr_pages; i++) {
|
|
rb->data_pages[i] = perf_mmap_alloc_page(cpu);
|
|
if (!rb->data_pages[i])
|
|
goto fail_data_pages;
|
|
}
|
|
|
|
rb->nr_pages = nr_pages;
|
|
|
|
ring_buffer_init(rb, watermark, flags);
|
|
|
|
return rb;
|
|
|
|
fail_data_pages:
|
|
for (i--; i >= 0; i--)
|
|
free_page((unsigned long)rb->data_pages[i]);
|
|
|
|
free_page((unsigned long)rb->user_page);
|
|
|
|
fail_user_page:
|
|
kfree(rb);
|
|
|
|
fail:
|
|
return NULL;
|
|
}
|
|
|
|
static void perf_mmap_free_page(unsigned long addr)
|
|
{
|
|
struct page *page = virt_to_page((void *)addr);
|
|
|
|
page->mapping = NULL;
|
|
__free_page(page);
|
|
}
|
|
|
|
void rb_free(struct ring_buffer *rb)
|
|
{
|
|
int i;
|
|
|
|
perf_mmap_free_page((unsigned long)rb->user_page);
|
|
for (i = 0; i < rb->nr_pages; i++)
|
|
perf_mmap_free_page((unsigned long)rb->data_pages[i]);
|
|
kfree(rb);
|
|
}
|
|
|
|
#else
|
|
static int data_page_nr(struct ring_buffer *rb)
|
|
{
|
|
return rb->nr_pages << page_order(rb);
|
|
}
|
|
|
|
static struct page *
|
|
__perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
|
|
{
|
|
/* The '>' counts in the user page. */
|
|
if (pgoff > data_page_nr(rb))
|
|
return NULL;
|
|
|
|
return vmalloc_to_page((void *)rb->user_page + pgoff * PAGE_SIZE);
|
|
}
|
|
|
|
static void perf_mmap_unmark_page(void *addr)
|
|
{
|
|
struct page *page = vmalloc_to_page(addr);
|
|
|
|
page->mapping = NULL;
|
|
}
|
|
|
|
static void rb_free_work(struct work_struct *work)
|
|
{
|
|
struct ring_buffer *rb;
|
|
void *base;
|
|
int i, nr;
|
|
|
|
rb = container_of(work, struct ring_buffer, work);
|
|
nr = data_page_nr(rb);
|
|
|
|
base = rb->user_page;
|
|
/* The '<=' counts in the user page. */
|
|
for (i = 0; i <= nr; i++)
|
|
perf_mmap_unmark_page(base + (i * PAGE_SIZE));
|
|
|
|
vfree(base);
|
|
kfree(rb);
|
|
}
|
|
|
|
void rb_free(struct ring_buffer *rb)
|
|
{
|
|
schedule_work(&rb->work);
|
|
}
|
|
|
|
struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
|
|
{
|
|
struct ring_buffer *rb;
|
|
unsigned long size;
|
|
void *all_buf;
|
|
|
|
size = sizeof(struct ring_buffer);
|
|
size += sizeof(void *);
|
|
|
|
rb = kzalloc(size, GFP_KERNEL);
|
|
if (!rb)
|
|
goto fail;
|
|
|
|
INIT_WORK(&rb->work, rb_free_work);
|
|
|
|
all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE);
|
|
if (!all_buf)
|
|
goto fail_all_buf;
|
|
|
|
rb->user_page = all_buf;
|
|
rb->data_pages[0] = all_buf + PAGE_SIZE;
|
|
if (nr_pages) {
|
|
rb->nr_pages = 1;
|
|
rb->page_order = ilog2(nr_pages);
|
|
}
|
|
|
|
ring_buffer_init(rb, watermark, flags);
|
|
|
|
return rb;
|
|
|
|
fail_all_buf:
|
|
kfree(rb);
|
|
|
|
fail:
|
|
return NULL;
|
|
}
|
|
|
|
#endif
|
|
|
|
struct page *
|
|
perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
|
|
{
|
|
if (rb->aux_nr_pages) {
|
|
/* above AUX space */
|
|
if (pgoff > rb->aux_pgoff + rb->aux_nr_pages)
|
|
return NULL;
|
|
|
|
/* AUX space */
|
|
if (pgoff >= rb->aux_pgoff) {
|
|
int aux_pgoff = array_index_nospec(pgoff - rb->aux_pgoff, rb->aux_nr_pages);
|
|
return virt_to_page(rb->aux_pages[aux_pgoff]);
|
|
}
|
|
}
|
|
|
|
return __perf_mmap_to_page(rb, pgoff);
|
|
}
|