Changes in 4.19.323
staging: iio: frequency: ad9833: Get frequency value statically
staging: iio: frequency: ad9833: Load clock using clock framework
staging: iio: frequency: ad9834: Validate frequency parameter value
usbnet: ipheth: fix carrier detection in modes 1 and 4
net: ethernet: use ip_hdrlen() instead of bit shift
net: phy: vitesse: repair vsc73xx autonegotiation
scripts: kconfig: merge_config: config files: add a trailing newline
arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma
net/mlx5: Update the list of the PCI supported devices
net: ftgmac100: Enable TX interrupt to avoid TX timeout
net: dpaa: Pad packets to ETH_ZLEN
soundwire: stream: Revert "soundwire: stream: fix programming slave ports for non-continous port maps"
selftests/vm: remove call to ksft_set_plan()
selftests/kcmp: remove call to ksft_set_plan()
ASoC: allow module autoloading for table db1200_pids
pinctrl: at91: make it work with current gpiolib
microblaze: don't treat zero reserved memory regions as error
net: ftgmac100: Ensure tx descriptor updates are visible
wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room()
wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
ASoC: tda7419: fix module autoloading
spi: bcm63xx: Enable module autoloading
x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency
ocfs2: add bounds checking to ocfs2_xattr_find_entry()
ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()
gpio: prevent potential speculation leaks in gpio_device_get_desc()
USB: serial: pl2303: add device id for Macrosilicon MS3020
ACPI: PMIC: Remove unneeded check in tps68470_pmic_opregion_probe()
wifi: ath9k: fix parameter check in ath9k_init_debug()
wifi: ath9k: Remove error checks when creating debugfs entries
netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire
wifi: cfg80211: fix UBSAN noise in cfg80211_wext_siwscan()
wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors
wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()
can: bcm: Clear bo->bcm_proc_read after remove_proc_entry().
Bluetooth: btusb: Fix not handling ZPL/short-transfer
block, bfq: fix possible UAF for bfqq->bic with merge chain
block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator()
block, bfq: don't break merge chain in bfq_split_bfqq()
spi: ppc4xx: handle irq_of_parse_and_map() errors
spi: ppc4xx: Avoid returning 0 when failed to parse and map IRQ
ARM: versatile: fix OF node leak in CPUs prepare
reset: berlin: fix OF node leak in probe() error path
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
hwmon: (max16065) Fix overflows seen when writing limits
mtd: slram: insert break after errors in parsing the map
hwmon: (ntc_thermistor) fix module autoloading
power: supply: max17042_battery: Fix SOC threshold calc w/ no current sense
fbdev: hpfb: Fix an error handling path in hpfb_dio_probe()
drm/stm: Fix an error handling path in stm_drm_platform_probe()
drm/amd: fix typo
drm/amdgpu: Replace one-element array with flexible-array member
drm/amdgpu: properly handle vbios fake edid sizing
drm/radeon: Replace one-element array with flexible-array member
drm/radeon: properly handle vbios fake edid sizing
drm/rockchip: vop: Allow 4096px width scaling
drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets
jfs: fix out-of-bounds in dbNextAG() and diAlloc()
drm/msm/a5xx: properly clear preemption records on resume
drm/msm/a5xx: fix races in preemption evaluation stage
ipmi: docs: don't advertise deprecated sysfs entries
drm/msm: fix %s null argument error
xen: use correct end address of kernel for conflict checking
xen/swiotlb: simplify range_straddles_page_boundary()
xen/swiotlb: add alignment check for dma buffers
selftests/bpf: Fix error compiling test_lru_map.c
xz: cleanup CRC32 edits from 2018
kthread: add kthread_work tracepoints
kthread: fix task state in kthread worker if being frozen
jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()
ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard
smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso
ext4: avoid negative min_clusters in find_group_orlov()
ext4: return error on ext4_find_inline_entry
ext4: avoid OOB when system.data xattr changes underneath the filesystem
nilfs2: fix potential null-ptr-deref in nilfs_btree_insert()
nilfs2: determine empty node blocks as corrupted
nilfs2: fix potential oob read in nilfs_btree_check_delete()
perf sched timehist: Fix missing free of session in perf_sched__timehist()
perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time
perf time-utils: Fix 32-bit nsec parsing
clk: rockchip: Set parent rate for DCLK_VOP clock on RK3228
drivers: media: dvb-frontends/rtl2832: fix an out-of-bounds write error
drivers: media: dvb-frontends/rtl2830: fix an out-of-bounds write error
PCI: xilinx-nwl: Fix register misspelling
RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency
pinctrl: single: fix missing error code in pcs_probe()
clk: ti: dra7-atl: Fix leak of of_nodes
pinctrl: mvebu: Fix devinit_dove_pinctrl_probe function
RDMA/cxgb4: Added NULL check for lookup_atid
ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
nfsd: call cache_put if xdr_reserve_space returns NULL
f2fs: enhance to update i_mode and acl atomically in f2fs_setattr()
f2fs: fix typo
f2fs: fix to update i_ctime in __f2fs_setxattr()
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: reduce expensive checkpoint trigger frequency
coresight: tmc: sg: Do not leak sg_table
netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put()
net: seeq: Fix use after free vulnerability in ether3 Driver Due to Race Condition
tcp: introduce tcp_skb_timestamp_us() helper
tcp: check skb is non-NULL in tcp_rto_delta_us()
net: qrtr: Update packets cloning when broadcasting
netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS
crypto: aead,cipher - zeroize key buffer after use
Remove *.orig pattern from .gitignore
soc: versatile: integrator: fix OF node leak in probe() error path
USB: appledisplay: close race between probe and completion handler
USB: misc: cypress_cy7c63: check for short transfer
firmware_loader: Block path traversal
tty: rp2: Fix reset with non forgiving PCIe host bridges
drbd: Fix atomicity violation in drbd_uuid_set_bm()
drbd: Add NULL check for net_conf to prevent dereference in state validation
ACPI: sysfs: validate return type of _STR method
f2fs: prevent possible int overflow in dir_block_index()
f2fs: avoid potential int overflow in sanity_check_area_boundary()
vfs: fix race between evice_inodes() and find_inode()&iput()
fs: Fix file_set_fowner LSM hook inconsistencies
nfs: fix memory leak in error path of nfs4_do_reclaim
PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()
PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler
soc: versatile: realview: fix memory leak during device remove
soc: versatile: realview: fix soc_dev leak during device remove
usb: yurex: Replace snprintf() with the safer scnprintf() variant
USB: misc: yurex: fix race between read and write
pps: remove usage of the deprecated ida_simple_xx() API
pps: add an error check in parport_attach
i2c: aspeed: Update the stop sw state when the bus recovery occurs
i2c: isch: Add missed 'else'
usb: yurex: Fix inconsistent locking bug in yurex_read()
mailbox: rockchip: fix a typo in module autoloading
mailbox: bcm2835: Fix timeout during suspend mode
ceph: remove the incorrect Fw reference check when dirtying pages
netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED
netfilter: nf_tables: prevent nf_skb_duplicated corruption
r8152: Factor out OOB link list waits
net: ethernet: lantiq_etop: fix memory disclosure
net: avoid potential underflow in qdisc_pkt_len_init() with UFO
net: add more sanity checks to qdisc_pkt_len_init()
ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start
ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs
ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin
f2fs: Require FMODE_WRITE for atomic write ioctls
wifi: ath9k: fix possible integer overflow in ath9k_get_et_stats()
wifi: ath9k_htc: Use __skb_set_length() for resetting urb before resubmit
net: hisilicon: hip04: fix OF node leak in probe()
net: hisilicon: hns_dsaf_mac: fix OF node leak in hns_mac_get_info()
net: hisilicon: hns_mdio: fix OF node leak in probe()
ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails
ACPICA: Fix memory leak if acpi_ps_get_next_field() fails
ACPI: EC: Do not release locks during operation region accesses
ACPICA: check null return of ACPI_ALLOCATE_ZEROED() in acpi_db_convert_to_package()
tipc: guard against string buffer overrun
net: mvpp2: Increase size of queue_name buffer
ipv4: Check !in_dev earlier for ioctl(SIOCSIFADDR).
ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family
tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process
ACPICA: iasl: handle empty connection_node
wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
signal: Replace BUG_ON()s
ALSA: asihpi: Fix potential OOB array access
ALSA: hdsp: Break infinite MIDI input flush loop
fbdev: pxafb: Fix possible use after free in pxafb_task()
power: reset: brcmstb: Do not go into infinite loop if reset fails
ata: sata_sil: Rename sil_blacklist to sil_quirks
jfs: UBSAN: shift-out-of-bounds in dbFindBits
jfs: Fix uaf in dbFreeBits
jfs: check if leafidx greater than num leaves per dmap tree
jfs: Fix uninit-value access of new_ea in ea_buffer
drm/amd/display: Check stream before comparing them
drm/amd/display: Fix index out of bounds in degamma hardware format translation
drm/printer: Allow NULL data in devcoredump printer
scsi: aacraid: Rearrange order of struct aac_srb_unit
drm/radeon/r100: Handle unknown family in r100_cp_init_microcode()
of/irq: Refer to actual buffer size in of_irq_parse_one()
ext4: ext4_search_dir should return a proper error
ext4: fix i_data_sem unlock order in ext4_ind_migrate()
spi: s3c64xx: fix timeout counters in flush_fifo
selftests: breakpoints: use remaining time to check if suspend succeed
selftests: vDSO: fix vDSO symbols lookup for powerpc64
i2c: xiic: Wait for TX empty to avoid missed TX NAKs
spi: bcm63xx: Fix module autoloading
perf/core: Fix small negative period being ignored
parisc: Fix itlb miss handler for 64-bit programs
ALSA: core: add isascii() check to card ID generator
ext4: no need to continue when the number of entries is 1
ext4: propagate errors from ext4_find_extent() in ext4_insert_range()
ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()
ext4: aovid use-after-free in ext4_ext_insert_extent()
ext4: fix double brelse() the buffer of the extents path
ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit()
parisc: Fix 64-bit userspace syscall path
of/irq: Support #msi-cells=<0> in of_msi_get_domain
jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error
ocfs2: fix the la space leak when unmounting an ocfs2 volume
ocfs2: fix uninit-value in ocfs2_get_block()
ocfs2: reserve space for inline xattr before attaching reflink tree
ocfs2: cancel dqi_sync_work before freeing oinfo
ocfs2: remove unreasonable unlock in ocfs2_read_blocks
ocfs2: fix null-ptr-deref when journal load failed.
ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate
riscv: define ILLEGAL_POINTER_VALUE for 64bit
aoe: fix the potential use-after-free problem in more places
clk: rockchip: fix error for unknown clocks
media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags
media: venus: fix use after free bug in venus_remove due to race condition
iio: magnetometer: ak8975: Fix reading for ak099xx sensors
tomoyo: fallback to realpath if symlink's pathname does not exist
Input: adp5589-keys - fix adp5589_gpio_get_value()
btrfs: wait for fixup workers before stopping cleaner kthread during umount
gpio: davinci: fix lazy disable
ext4: avoid ext4_error()'s caused by ENOMEM in the truncate path
ext4: fix slab-use-after-free in ext4_split_extent_at()
ext4: update orig_path in ext4_find_extent()
arm64: Add Cortex-715 CPU part definition
arm64: cputype: Add Neoverse-N3 definitions
arm64: errata: Expand speculative SSBS workaround once more
uprobes: fix kernel info leak via "[uprobes]" vma
nfsd: use ktime_get_seconds() for timestamps
nfsd: fix delegation_blocked() to block correctly for at least 30 seconds
rtc: at91sam9: drop platform_data support
rtc: at91sam9: fix OF node leak in probe() error path
ACPI: battery: Simplify battery hook locking
ACPI: battery: Fix possible crash when unregistering a battery hook
ext4: fix inode tree inconsistency caused by ENOMEM
net: ethernet: cortina: Drop TSO support
tracing: Remove precision vsnprintf() check from print event
drm: Move drm_mode_setcrtc() local re-init to failure path
drm/crtc: fix uninitialized variable use even harder
virtio_console: fix misc probe bugs
Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal
bpf: Check percpu map value size first
s390/facility: Disable compile time optimization for decompressor code
s390/mm: Add cond_resched() to cmm_alloc/free_pages()
ext4: nested locking for xattr inode
s390/cpum_sf: Remove WARN_ON_ONCE statements
ktest.pl: Avoid false positives with grub2 skip regex
clk: bcm: bcm53573: fix OF node leak in init
i2c: i801: Use a different adapter-name for IDF adapters
PCI: Mark Creative Labs EMU20k2 INTx masking as broken
media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put()
usb: chipidea: udc: enable suspend interrupt after usb reset
tools/iio: Add memory allocation failure check for trigger_name
driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute
fbdev: sisfb: Fix strbuf array overflow
NFS: Remove print_overflow_msg()
SUNRPC: Fix integer overflow in decode_rc_list()
tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe
netfilter: br_netfilter: fix panic with metadata_dst skb
Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change
gpio: aspeed: Add the flush write to ensure the write complete.
clk: Add (devm_)clk_get_optional() functions
clk: generalize devm_clk_get() a bit
clk: Provide new devm_clk helpers for prepared and enabled clocks
gpio: aspeed: Use devm_clk api to manage clock source
igb: Do not bring the device up after non-fatal error
net: ibm: emac: mal: fix wrong goto
ppp: fix ppp_async_encode() illegal access
net: ipv6: ensure we call ipv6_mc_down() at most once
CDC-NCM: avoid overflow in sanity checking
HID: plantronics: Workaround for an unexcepted opposite volume key
Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant"
usb: xhci: Fix problem with xhci resume from suspend
usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip
net: Fix an unsafe loop on the list
posix-clock: Fix missing timespec64 check in pc_clock_settime()
arm64: probes: Remove broken LDR (literal) uprobe support
arm64: probes: Fix simulate_ldr*_literal()
PCI: Add function 0 DMA alias quirk for Glenfly Arise chip
fat: fix uninitialized variable
KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
net: dsa: mv88e6xxx: Fix out-of-bound access
s390/sclp_vt220: Convert newlines to CRLF instead of LFCR
KVM: s390: Change virtual to physical address access in diag 0x258 handler
x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
drm/vmwgfx: Handle surface check failure correctly
iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig
iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig
iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency()
iio: light: opt3001: add missing full-scale range value
Bluetooth: Remove debugfs directory on module init failure
Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001
xhci: Fix incorrect stream context type macro
USB: serial: option: add support for Quectel EG916Q-GL
USB: serial: option: add Telit FN920C04 MBIM compositions
parport: Proper fix for array out-of-bounds access
x86/apic: Always explicitly disarm TSC-deadline timer
nilfs2: propagate directory read errors from nilfs_find_entry()
clk: Fix pointer casting to prevent oops in devm_clk_release()
clk: Fix slab-out-of-bounds error in devm_clk_release()
RDMA/bnxt_re: Fix incorrect AVID type in WQE structure
RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP
RDMA/bnxt_re: Return more meaningful error
drm/msm/dsi: fix 32-bit signed integer extension in pclk_rate calculation
macsec: don't increment counters for an unrelated SA
net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit()
net: systemport: fix potential memory leak in bcm_sysport_xmit()
usb: typec: altmode should keep reference to parent
Bluetooth: bnep: fix wild-memory-access in proto_unregister
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: probes: Fix uprobes for big-endian kernels
KVM: s390: gaccess: Refactor gpa and length calculation
KVM: s390: gaccess: Refactor access address range check
KVM: s390: gaccess: Cleanup access to guest pages
KVM: s390: gaccess: Check if guest address is in memslot
udf: fix uninit-value use in udf_get_fileshortad
jfs: Fix sanity check in dbMount
net/sun3_82586: fix potential memory leak in sun3_82586_send_packet()
be2net: fix potential memory leak in be_xmit()
net: usb: usbnet: fix name regression
posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
ALSA: hda/realtek: Update default depop procedure
drm/amd: Guard against bad data for ATIF ACPI method
ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue
nilfs2: fix kernel bug due to missing clearing of buffer delay flag
hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
selinux: improve error checking in sel_write_load()
arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning
xfrm: validate new SA's prefixlen using SA family when sel.family is unset
usb: dwc3: remove generic PHY calibrate() calls
usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc
usb: dwc3: core: Stop processing of pending events if controller is halted
cgroup: Fix potential overflow issue when checking max_depth
wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys
gtp: simplify error handling code in 'gtp_encap_enable()'
gtp: allow -1 to be specified as file description from userspace
net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT
bpf: Fix out-of-bounds write in trie_get_next_key()
net: support ip generic csum processing in skb_csum_hwoffload_help
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
net: amd: mvme147: Fix probe banner message
misc: sgi-gru: Don't disable preemption in GRU driver
usbip: tools: Fix detach_port() invalid port error path
usb: phy: Fix API devm_usb_put_phy() can not release the phy
xhci: Fix Link TRB DMA in command ring stopped completion event
Revert "driver core: Fix uevent_show() vs driver detach race"
wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower
wifi: ath10k: Fix memory leak in management tx
wifi: iwlegacy: Clear stale interrupts before resuming device
nilfs2: fix potential deadlock with newly created symlinks
ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow
nilfs2: fix kernel bug due to missing clearing of checked flag
mm: shmem: fix data-race in shmem_getattr()
vt: prevent kernel-infoleak in con_font_get()
Linux 4.19.323
Change-Id: I2348f834187153067ab46b3b48b8fe7da9cee1f1
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 6d6e54fc71ad1ab0a87047fd9c211e75d86084a3 upstream.
For fixing CVE-2023-6270, f98364e92662 ("aoe: fix the potential
use-after-free problem in aoecmd_cfg_pkts") makes tx() calling dev_put()
instead of doing in aoecmd_cfg_pkts(). It avoids that the tx() runs
into use-after-free.
Then Nicolai Stange found more places in aoe have potential use-after-free
problem with tx(). e.g. revalidate(), aoecmd_ata_rw(), resend(), probe()
and aoecmd_cfg_rsp(). Those functions also use aoenet_xmit() to push
packet to tx queue. So they should also use dev_hold() to increase the
refcnt of skb->dev.
On the other hand, moving dev_put() to tx() causes that the refcnt of
skb->dev be reduced to a negative value, because corresponding
dev_hold() are not called in revalidate(), aoecmd_ata_rw(), resend(),
probe(), and aoecmd_cfg_rsp(). This patch fixed this issue.
Cc: stable@vger.kernel.org
Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: f98364e92662 ("aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts")
Reported-by: Nicolai Stange <nstange@suse.com>
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Link: https://lore.kernel.org/stable/20240624064418.27043-1-jlee%40suse.com
Link: https://lore.kernel.org/r/20241002035458.24401-1-jlee@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.19.311
ASoC: rt5645: Make LattePanda board DMI match more precise
x86/xen: Add some null pointer checking to smp.c
MIPS: Clear Cause.BD in instruction_pointer_set
net/iucv: fix the allocation size of iucv_path_table array
block: sed-opal: handle empty atoms when parsing response
dm-verity, dm-crypt: align "struct bvec_iter" correctly
scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
firewire: core: use long bus reset on gap count error
ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
Input: gpio_keys_polled - suppress deferred probe error for gpio
ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
crypto: algif_aead - fix uninitialized ctx->init
crypto: af_alg - make some functions static
crypto: algif_aead - Only wake up when ctx->more is zero
do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak
fs/select: rework stack allocation hack for clang
md: switch to ->check_events for media change notifications
block: add a new set_read_only method
md: implement ->set_read_only to hook into BLKROSET processing
md: Don't clear MD_CLOSING when the raid is about to stop
aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
timekeeping: Fix cross-timestamp interpolation on counter wrap
timekeeping: Fix cross-timestamp interpolation corner case decision
timekeeping: Fix cross-timestamp interpolation for non-x86
wifi: ath10k: fix NULL pointer dereference in ath10k_wmi_tlv_op_pull_mgmt_tx_compl_ev()
b43: dma: Fix use true/false for bool type variable
wifi: b43: Stop/wake correct queue in DMA Tx path when QoS is disabled
wifi: b43: Stop/wake correct queue in PIO Tx path when QoS is disabled
b43: main: Fix use true/false for bool type
wifi: b43: Stop correct queue in DMA worker when QoS is disabled
wifi: b43: Disable QoS for bcm4331
wifi: mwifiex: debugfs: Drop unnecessary error check for debugfs_create_dir()
sock_diag: annotate data-races around sock_diag_handlers[family]
af_unix: Annotate data-race of gc_in_progress in wait_for_unix_gc().
wifi: libertas: fix some memleaks in lbs_allocate_cmd_buffer()
ACPI: processor_idle: Fix memory leak in acpi_processor_power_exit()
bus: tegra-aconnect: Update dependency to ARCH_TEGRA
iommu/amd: Mark interrupt as managed
wifi: brcmsmac: avoid function pointer casts
ARM: dts: arm: realview: Fix development chip ROM compatible value
ACPI: scan: Fix device check notification handling
x86, relocs: Ignore relocations in .notes section
SUNRPC: fix some memleaks in gssx_dec_option_array
mmc: wmt-sdmmc: remove an incorrect release_mem_region() call in the .remove function
igb: move PEROUT and EXTTS isr logic to separate functions
igb: Fix missing time sync events
Bluetooth: Remove superfluous call to hci_conn_check_pending()
Bluetooth: hci_core: Fix possible buffer overflow
sr9800: Add check for usbnet_get_endpoints
bpf: Fix hashtab overflow check on 32-bit arches
bpf: Fix stackmap overflow check on 32-bit arches
ipv6: fib6_rules: flush route cache when rule is changed
tcp: fix incorrect parameter validation in the do_tcp_getsockopt() function
l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() function
udp: fix incorrect parameter validation in the udp_lib_getsockopt() function
net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function
net/x25: fix incorrect parameter validation in the x25_getsockopt() function
nfp: flower: handle acti_netdevs allocation failure
dm raid: fix false positive for requeue needed during reshape
dm: call the resume method on internal suspend
drm/tegra: dsi: Add missing check for of_find_device_by_node
gpu: host1x: mipi: Update tegra_mipi_request() to be node based
drm/tegra: dsi: Make use of the helper function dev_err_probe()
drm/tegra: dsi: Fix some error handling paths in tegra_dsi_probe()
drm/tegra: dsi: Fix missing pm_runtime_disable() in the error handling path of tegra_dsi_probe()
drm/rockchip: inno_hdmi: Fix video timing
drm: Don't treat 0 as -1 in drm_fixp2int_ceil
drm/rockchip: lvds: do not overwrite error code
drm/rockchip: lvds: do not print scary message when probing defer
media: tc358743: register v4l2 async device only after successful setup
perf evsel: Fix duplicate initialization of data->id in evsel__parse_sample()
ABI: sysfs-bus-pci-devices-aer_stats uses an invalid tag
media: em28xx: annotate unchecked call to media_device_register()
media: v4l2-tpg: fix some memleaks in tpg_alloc
media: v4l2-mem2mem: fix a memleak in v4l2_m2m_register_entity
media: dvbdev: remove double-unlock
media: media/dvb: Use kmemdup rather than duplicating its implementation
media: dvbdev: Fix memleak in dvb_register_device
media: dvbdev: fix error logic at dvb_register_device()
media: dvb-core: Fix use-after-free due to race at dvb_register_device()
media: edia: dvbdev: fix a use-after-free
clk: qcom: reset: Allow specifying custom reset delay
clk: qcom: reset: support resetting multiple bits
clk: qcom: reset: Commonize the de/assert functions
clk: qcom: reset: Ensure write completion on reset de/assertion
quota: code cleanup for __dquot_alloc_space()
fs/quota: erase unused but set variable warning
quota: check time limit when back out space/inode change
quota: simplify drop_dquot_ref()
quota: Fix potential NULL pointer dereference
quota: Fix rcu annotations of inode dquot pointers
perf thread_map: Free strlist on normal path in thread_map__new_by_tid_str()
drm/radeon/ni: Fix wrong firmware size logging in ni_init_microcode()
ALSA: seq: fix function cast warnings
media: go7007: add check of return value of go7007_read_addr()
media: pvrusb2: fix pvr2_stream_callback casts
firmware: qcom: scm: Add WLAN VMID for Qualcomm SCM interface
clk: qcom: dispcc-sdm845: Adjust internal GDSC wait times
drm/mediatek: dsi: Fix DSI RGB666 formats and definitions
PCI: Mark 3ware-9650SE Root Port Extended Tags as broken
clk: hisilicon: hi3519: Release the correct number of gates in hi3519_clk_unregister()
drm/tegra: put drm_gem_object ref on error in tegra_fb_create
mfd: syscon: Call of_node_put() only when of_parse_phandle() takes a ref
crypto: arm - Rename functions to avoid conflict with crypto/sha256.h
crypto: arm/sha - fix function cast warnings
mtd: rawnand: lpc32xx_mlc: fix irq handler prototype
ASoC: meson: axg-tdm-interface: fix mclk setup without mclk-fs
drm/amdgpu: Fix missing break in ATOM_ARG_IMM Case of atom_get_src_int()
media: pvrusb2: fix uaf in pvr2_context_set_notify
media: dvb-frontends: avoid stack overflow warnings with clang
media: go7007: fix a memleak in go7007_load_encoder
drm/mediatek: Fix a null pointer crash in mtk_drm_crtc_finish_page_flip
powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks
powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc.
backlight: lm3630a: Initialize backlight_properties on init
backlight: lm3630a: Don't set bl->props.brightness in get_brightness
backlight: da9052: Fully initialize backlight_properties during probe
backlight: lm3639: Fully initialize backlight_properties during probe
backlight: lp8788: Fully initialize backlight_properties during probe
sparc32: Fix section mismatch in leon_pci_grpci
ALSA: usb-audio: Stop parsing channels bits when all channels are found.
scsi: csiostor: Avoid function pointer casts
scsi: bfa: Fix function pointer type mismatch for hcb_qe->cbfn
net: sunrpc: Fix an off by one in rpc_sockaddr2uaddr()
NFS: Fix an off by one in root_nfs_cat()
clk: qcom: gdsc: Add support to update GDSC transition delay
serial: max310x: fix syntax error in IRQ error message
tty: serial: samsung: fix tx_empty() to return TIOCSER_TEMT
kconfig: fix infinite loop when expanding a macro at the end of file
rtc: mt6397: select IRQ_DOMAIN instead of depending on it
serial: 8250_exar: Don't remove GPIO device on suspend
staging: greybus: fix get_channel_from_mode() failure path
usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin
net: hsr: fix placement of logical operator in a multi-line statement
hsr: Fix uninit-value access in hsr_get_node()
rds: introduce acquire/release ordering in acquire/release_in_xmit()
hsr: Handle failures in module init
net/bnx2x: Prevent access to a freed page in page_pool
spi: spi-mt65xx: Fix NULL pointer access in interrupt handler
crypto: af_alg - Fix regression on empty requests
crypto: af_alg - Work around empty control messages without MSG_MORE
Linux 4.19.311
Change-Id: I034e9a44b6dec1a7b5c600b3cd77aabc401044d7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f98364e926626c678fb4b9004b75cacf92ff0662 ]
This patch is against CVE-2023-6270. The description of cve is:
A flaw was found in the ATA over Ethernet (AoE) driver in the Linux
kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on
`struct net_device`, and a use-after-free can be triggered by racing
between the free on the struct and the access through the `skbtxq`
global queue. This could lead to a denial of service condition or
potential code execution.
In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial
code is finished. But the net_device ifp will still be used in
later tx()->dev_queue_xmit() in kthread. Which means that the
dev_put(ifp) should NOT be called in the success path of skb
initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into
use-after-free because the net_device is freed.
This patch removed the dev_put(ifp) in the success path in
aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx().
Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: 7562f876cd ("[NET]: Rework dev_base via list_head (v3)")
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 4fdc94b476.
The patch series submitted for the 4.19-stable branch:
https://lore.kernel.org/r/20210223092859.17033-1-jefflexu@linux.alibaba.com
should be reverted from the android-4.19-stable branch as it breaks the
ABI for device_add_disk() and the issue the series is resolving does not
affect Android devices. So revert the whole thing.
Bug: 161946584
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I483477f07ae5359975f7c11bf90bdb29269f1b03
mempool_destroy has taken the null pointer into account. So it is safe
to remove the null check.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114722 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.
see: https://lkml.org/lkml/2016/8/2/1945
Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>
Miscellanea:
o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' is deprecated because it uses
32-bit representation for seconds which will overflow in year 2038.
This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
is the recommended type for timestamping
- ktime_t provides both long range (like jiffies) and high resolution
(like timeval). Using ktime_get (monotonic time) instead of wall-clock
time prevents any discprepancies caused by updates to system time.
[updates by Arnd below]
The original patch from Tina never went anywhere as we discussed how
to keep the impact on performance minimal. I've started over now but
arrived at basically the same patch that she had originally, except for
an slightly improved tsince_hr() function. I'm making it more robust
against overflows, and also optimize explicitly for the common case
in which a frame is less than 4.2 seconds old, using only a 32-bit
division in that case.
This should make the new version more efficient than the old code,
since we replace the existing two 32-bit division in do_gettimeofday()
plus one multiplication with a single single 32-bit division in
tsince_hr() and drop the double bookkeeping. It's also more efficient
than the ktime_get_us() API we discussed before, since that would
also rely on multiple divisions.
Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Cc: Ed Cashin <ed.cashin@acm.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:
perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
The now unused macros are also dropped from include/linux/timer.h.
Signed-off-by: Kees Cook <keescook@chromium.org>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
This refactors the discover_timer to remove the needless locking and
state machine used for synchronizing timer death. Using del_timer_sync()
will already do the right thing.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ed.cashin@acm.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Instead move it to the callers. Those that either don't use bio_data() or
page_address() or are specific to architectures that do not support highmem
are skipped.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
aoeblk contains some mysterious code, that wants to elevate the bio
vec page counts while it's under IO. That is not needed, it's
fragile, and it's causing kernel oopses for some.
Reported-by: Tested-by: Don Koch <kochd@us.ibm.com>
Tested-by: Tested-by: Don Koch <kochd@us.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This is the third version of the patchset previously sent [1]. I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree. I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2. I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.
Motivation:
While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree. It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often. It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.
I think it makes some sense to get rid of them because they are just
making the semantic more unclear. Please note that GFP_REPEAT is
documented as
* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
* _might_ fail. This depends upon the particular VM implementation.
while !costly requests have basically nofail semantic. So one could
reasonably expect that order-0 request with __GFP_REPEAT will not loop
for ever. This is not implemented right now though.
I would like to move on with __GFP_REPEAT and define a better semantic
for it.
$ git grep __GFP_REPEAT origin/master | wc -l
111
$ git grep __GFP_REPEAT | wc -l
36
So we are down to the third after this patch series. The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests. This still needs some double checking which I will do later
after all the simple ones are sorted out.
I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them. Patches should be quite trivial to
review for stupid compile mistakes though. The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.
[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
This patch (of 19):
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations. Yet we
have the full kernel tree with its usage for apparently order-0
allocations. This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).
Let's simply drop __GFP_REPEAT from those places. This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.
Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many developers already know that field for reference count of the
struct page is _count and atomic type. They would try to handle it
directly and this could break the purpose of page reference count
tracepoint. To prevent direct _count modification, this patch rename it
to _refcount and add warning message on the code. After that, developer
who need to handle reference count will find that field should not be
accessed directly.
[akpm@linux-foundation.org: fix comments, per Vlastimil]
[akpm@linux-foundation.org: Documentation/vm/transhuge.txt too]
[sfr@canb.auug.org.au: sync ethernet driver changes]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sunil Goutham <sgoutham@cavium.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Manish Chopra <manish.chopra@qlogic.com>
Cc: Yuval Mintz <yuval.mintz@qlogic.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count. Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it. Then, it is hard to find actual
reason of CMA allocation failure. CMA allocation should be guaranteed
to succeed so finding offending place is really important.
In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function. This is preparation step to
add tracepoint to each page reference manipulation function. With this
facility, we can easily find reason of CMA allocation failure. There is
no functional change in this patch.
In addition, this patch also converts reference read sites. It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This reverts commit 34b48db66e.
That commit caused performance regressions for streaming I/O
workloads on a number of different storage devices, from
SATA disks to external RAID arrays. It also managed to
trip up some buggy firmware in at least one drive, causing
data corruption.
The next patch will bump the default max_sectors_kb value to
1280, which will accommodate a 10-data-disk stripe write
with chunk size 128k. In the testing I've done using iozone,
fio, and aio-stress, a value of 1280 does not show a big
performance difference from 512. This will hopefully still
help the software RAID setup that Christoph saw the original
performance gains with while still not regressing other
storage configurations.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we have two different ways to signal an I/O error on a BIO:
(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback
The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.
So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Set max_sectors to the value the drivers provides as hardware limit by
default. Linux had proper I/O throttling for a long time and doesn't
rely on a artifically small maximum I/O size anymore. By not limiting
the I/O size by default we remove an annoying tuning step required for
most Linux installation.
Note that both the user, and if absolutely required the driver can still
impose a limit for FS requests below max_hw_sectors_kb.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit bf6bddf192 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).
This results in a very rare NULL pointer dereference on the
aforementioned page_count(page). Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.
This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation. This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling. The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.
This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.
Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.
The aoe code no longer has to manually iterate over partial bvecs, so
some struct members go away - other struct members are effectively
renamed:
buf->resid -> buf->iter.bi_size
buf->sector -> buf->iter.bi_sector
f->bcnt -> f->iter.bi_size
f->lba -> f->iter.bi_sector
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
For immutable biovecs, we'll be introducing a new bio_iovec() that uses
our new bvec iterator to construct a biovec, taking into account
bvec_iter->bi_bvec_done - this patch updates existing users for the new
usage.
Some of the existing users really do need a pointer into the bvec array
- those uses are all going to be removed, but we'll need the
functionality from immutable to remove them - so for now rename the
existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
patches.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
If the system has trouble allocating memory for the creation of the aoe
debugfs directory or of a file inside it, the debugfs member of an aoedev
can be NULL.
Do not treat a NULL debugfs pointer as a BUG on aoedev shutdown, avoiding
the user impact of an unecessary panic.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes following compiler warnings:
drivers/block/aoe/aoecmd.c: In function `aoecmd_ata_rw':
drivers/block/aoe/aoecmd.c:383:17: warning: variable `t' set but not used [-Wunused-but-set-variable]
struct aoetgt *t;
^
drivers/block/aoe/aoecmd.c: In function `resend':
drivers/block/aoe/aoecmd.c:488:21: warning: variable `ah' set but not used [-Wunused-but-set-variable]
struct aoe_atahdr *ah;
^
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This information is presented in a compact format that has evolved for
easy routine scanning by expert humans, mostly developers and support
technicians helping to troubleshoot or test AoE-based systems.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series adds the debugging information that the coraid.com-distributed
aoe driver exports via sysfs, but instead of sysfs, it uses debugfs.
With these patches applied, even without AoE targets on the network, KEDR
reports new possible memory leaks, but these are from callers outside the
aoe driver that have used aoe_devnode to get the name of the character
devices through the aoe_class->devnode callback, and I believe they're
responsible for freeing that memory.
This patch:
Create and destroy the debugfs directory.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a BUG which can trigger when direct-IO is used with AOE.
As discussed previously, the fact that some users of the block layer
provide bios that point to pages with a zero _count means that it is not
OK for the network layer to do a put_page on the skb frags during an
skb_linearize, so the aoe driver gets a reference to pages in bios and
puts the reference before ending the bio. And because it cannot use
get_page on a page with a zero _count, it manipulates the value
directly.
It is not OK to increment the _count of a compound page tail, though,
since the VM layer will VM_BUG_ON a non-zero _count. Block users that
do direct I/O can result in the aoe driver seeing compound page tails in
bios. In that case, the same logic works as long as the head of the
compound page is used instead of the tails. This patch handles compound
pages and does not BUG.
It relies on the block layer user leaving the relationship between the
page tail and its head alone for the duration between the submission of
the bio and its completion, whether successful or not.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some users have a large AoE target while others like to use many AoE
targets at the same time. In the latter case, there is an opportunity to
greatly improve aggregate throughput by allowing different threads to
complete the I/O associated with each target. For 36 targets, 4 KiB read
throughput roughly doubles, for example, with these changes in place.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:
- mtip32xx fixes from Micron.
- A slew of drbd updates, this time in a nicer series.
- bcache, a flash/ssd caching framework from Kent.
- Fixes for cciss"
* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include <linux/prefetch.h>
...
Pull block core updates from Jens Axboe:
- Major bit is Kents prep work for immutable bio vecs.
- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.
- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.
- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.
- Runtime PM framework, SCSI patches exists on top of these in James'
tree.
- A few random fixes.
* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>