Changes in 5.15.75
Revert "fs: check FMODE_LSEEK to control internal pipe splicing"
ALSA: oss: Fix potential deadlock at unregistration
ALSA: rawmidi: Drop register_mutex in snd_rawmidi_free()
ALSA: usb-audio: Fix potential memory leaks
ALSA: usb-audio: Fix NULL dererence at error path
ALSA: hda/realtek: remove ALC289_FIXUP_DUAL_SPK for Dell 5530
ALSA: hda/realtek: Correct pin configs for ASUS G533Z
ALSA: hda/realtek: Add quirk for ASUS GV601R laptop
ALSA: hda/realtek: Add Intel Reference SSID to support headset keys
mtd: rawnand: atmel: Unmap streaming DMA mappings
io_uring/net: don't update msg_name if not provided
hv_netvsc: Fix race between VF offering and VF association message from host
cifs: destage dirty pages before re-reading them for cache=none
cifs: Fix the error length of VALIDATE_NEGOTIATE_INFO message
iio: dac: ad5593r: Fix i2c read protocol requirements
iio: ltc2497: Fix reading conversion results
iio: adc: ad7923: fix channel readings for some variants
iio: pressure: dps310: Refactor startup procedure
iio: pressure: dps310: Reset chip after timeout
xhci: dbc: Fix memory leak in xhci_alloc_dbc()
usb: add quirks for Lenovo OneLink+ Dock
can: kvaser_usb: Fix use of uninitialized completion
can: kvaser_usb_leaf: Fix overread with an invalid command
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
can: kvaser_usb_leaf: Fix CAN state after restart
mmc: sdhci-sprd: Fix minimum clock limit
i2c: designware: Fix handling of real but unexpected device interrupts
fs: dlm: fix race between test_bit() and queue_work()
fs: dlm: handle -EBUSY first in lock arg validation
HID: multitouch: Add memory barriers
quota: Check next/prev free block number after reading from quota file
platform/chrome: cros_ec_proto: Update version on GET_NEXT_EVENT failure
ASoC: wcd9335: fix order of Slimbus unprepare/disable
ASoC: wcd934x: fix order of Slimbus unprepare/disable
hwmon: (gsc-hwmon) Call of_node_get() before of_find_xxx API
net: thunderbolt: Enable DMA paths only after rings are enabled
regulator: qcom_rpm: Fix circular deferral regression
arm64: topology: move store_cpu_topology() to shared code
riscv: topology: fix default topology reporting
RISC-V: Make port I/O string accessors actually work
parisc: fbdev/stifb: Align graphics memory size to 4MB
riscv: Allow PROT_WRITE-only mmap()
riscv: Make VM_WRITE imply VM_READ
riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb
riscv: Pass -mno-relax only on lld < 15.0.0
UM: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
nvmem: core: Fix memleak in nvmem_register()
nvme-multipath: fix possible hang in live ns resize with ANA access
nvme-pci: set min_align_mask before calculating max_hw_sectors
Revert "drm/amdgpu: use dirty framebuffer helper"
dmaengine: mxs: use platform_driver_register
drm/virtio: Check whether transferred 2D BO is shmem
drm/virtio: Unlock reservations on virtio_gpu_object_shmem_init() error
drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb()
drm/udl: Restore display mode on resume
arm64: errata: Add Cortex-A55 to the repeat tlbi list
mm/damon: validate if the pmd entry is present before accessing
mm/mmap: undo ->mmap() when arch_validate_flags() fails
xen/gntdev: Prevent leaking grants
xen/gntdev: Accommodate VMA splitting
PCI: Sanitise firmware BAR assignments behind a PCI-PCI bridge
serial: 8250: Let drivers request full 16550A feature probing
serial: 8250: Request full 16550A feature probing for OxSemi PCIe devices
NFSD: Protect against send buffer overflow in NFSv3 READDIR
NFSD: Protect against send buffer overflow in NFSv2 READ
NFSD: Protect against send buffer overflow in NFSv3 READ
powercap: intel_rapl: Use standard Energy Unit for SPR Dram RAPL domain
powerpc/boot: Explicitly disable usage of SPE instructions
slimbus: qcom-ngd: use correct error in message of pdr_add_lookup() failure
slimbus: qcom-ngd: cleanup in probe error path
scsi: qedf: Populate sysfs attributes for vport
gpio: rockchip: request GPIO mux to pinctrl when setting direction
pinctrl: rockchip: add pinmux_ops.gpio_set_direction callback
fbdev: smscufx: Fix use-after-free in ufx_ops_open()
ksmbd: fix endless loop when encryption for response fails
ksmbd: Fix wrong return value and message length check in smb2_ioctl()
ksmbd: Fix user namespace mapping
fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
btrfs: fix race between quota enable and quota rescan ioctl
btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer
f2fs: complete checkpoints during remount
f2fs: flush pending checkpoints when freezing super
f2fs: increase the limit for reserve_root
f2fs: fix to do sanity check on destination blkaddr during recovery
f2fs: fix to do sanity check on summary info
hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO
hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
jbd2: wake up journal waiters in FIFO order, not LIFO
jbd2: fix potential buffer head reference count leak
jbd2: fix potential use-after-free in jbd2_fc_wait_bufs
jbd2: add miss release buffer head in fc_do_one_pass()
ext4: avoid crash when inline data creation follows DIO write
ext4: fix null-ptr-deref in ext4_write_info
ext4: make ext4_lazyinit_thread freezable
ext4: fix check for block being out of directory size
ext4: don't increase iversion counter for ea_inodes
ext4: ext4_read_bh_lock() should submit IO if the buffer isn't uptodate
ext4: place buffer head allocation before handle start
ext4: fix dir corruption when ext4_dx_add_entry() fails
ext4: fix miss release buffer head in ext4_fc_write_inode
ext4: fix potential memory leak in ext4_fc_record_modified_inode()
ext4: fix potential memory leak in ext4_fc_record_regions()
ext4: update 'state->fc_regions_size' after successful memory allocation
livepatch: fix race between fork and KLP transition
ftrace: Properly unset FTRACE_HASH_FL_MOD
ring-buffer: Allow splice to read previous partially read pages
ring-buffer: Have the shortest_full queue be the shortest not longest
ring-buffer: Check pending waiters when doing wake ups as well
ring-buffer: Add ring_buffer_wake_waiters()
ring-buffer: Fix race between reset page and reading page
tracing: Disable interrupt or preemption before acquiring arch_spinlock_t
tracing: Wake up ring buffer waiters on closing of the file
tracing: Wake up waiters when tracing is disabled
tracing: Add ioctl() to force ring buffer waiters to wake up
tracing: Move duplicate code of trace_kprobe/eprobe.c into header
tracing: Add "(fault)" name injection to kernel probes
tracing: Fix reading strings from synthetic events
thunderbolt: Explicitly enable lane adapter hotplug events at startup
efi: libstub: drop pointless get_memory_map() call
media: cedrus: Set the platform driver data earlier
media: cedrus: Fix endless loop in cedrus_h265_skip_bits()
blk-wbt: call rq_qos_add() after wb_normal is initialized
KVM: x86/emulator: Fix handing of POP SS to correctly set interruptibility
KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"
KVM: nVMX: Don't propagate vmcs12's PERF_GLOBAL_CTRL settings to vmcs02
KVM: VMX: Drop bits 31:16 when shoving exception error code into VMCS
staging: greybus: audio_helper: remove unused and wrong debugfs usage
drm/nouveau/kms/nv140-: Disable interlacing
drm/nouveau: fix a use-after-free in nouveau_gem_prime_import_sg_table()
drm/i915: Fix watermark calculations for gen12+ RC CCS modifier
drm/i915: Fix watermark calculations for gen12+ MC CCS modifier
drm/i915: Fix watermark calculations for gen12+ CCS+CC modifier
drm/amd/display: Fix vblank refcount in vrr transition
smb3: must initialize two ACL struct fields to zero
selinux: use "grep -E" instead of "egrep"
ima: fix blocking of security.ima xattrs of unsupported algorithms
userfaultfd: open userfaultfds with O_RDONLY
ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
thermal: cpufreq_cooling: Check the policy first in cpufreq_cooling_register()
sh: machvec: Use char[] for section boundaries
MIPS: SGI-IP27: Free some unused memory
MIPS: SGI-IP27: Fix platform-device leak in bridge_platform_create()
ARM: 9244/1: dump: Fix wrong pg_level in walk_pmd()
ARM: 9247/1: mm: set readonly for MT_MEMORY_RO with ARM_LPAE
objtool: Preserve special st_shndx indexes in elf_update_symbol
nfsd: Fix a memory leak in an error handling path
SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation
SUNRPC: Fix svcxdr_init_encode's buflen calculation
NFSD: Protect against send buffer overflow in NFSv2 READDIR
NFSD: Fix handling of oversized NFSv4 COMPOUND requests
wifi: rtlwifi: 8192de: correct checking of IQK reload
wifi: ath10k: add peer map clean up for peer delete in ath10k_sta_state()
leds: lm3601x: Don't use mutex after it was destroyed
bpf: Fix reference state management for synchronous callbacks
wifi: mac80211: allow bw change during channel switch in mesh
bpftool: Fix a wrong type cast in btf_dumper_int
spi: mt7621: Fix an error message in mt7621_spi_probe()
x86/resctrl: Fix to restore to original value when re-enabling hardware prefetch register
xsk: Fix backpressure mechanism on Tx
bpf: Disable preemption when increasing per-cpu map_locked
bpf: Propagate error from htab_lock_bucket() to userspace
bpf: Use this_cpu_{inc|dec|inc_return} for bpf_task_storage_busy
Bluetooth: btusb: mediatek: fix WMT failure during runtime suspend
wifi: rtl8xxxu: tighten bounds checking in rtl8xxxu_read_efuse()
wifi: rtw88: add missing destroy_workqueue() on error path in rtw_core_init()
selftests/xsk: Avoid use-after-free on ctx
spi: qup: add missing clk_disable_unprepare on error in spi_qup_resume()
spi: qup: add missing clk_disable_unprepare on error in spi_qup_pm_resume_runtime()
wifi: rtl8xxxu: Fix skb misuse in TX queue selection
spi: meson-spicc: do not rely on busy flag in pow2 clk ops
bpf: btf: fix truncated last_member_type_id in btf_struct_resolve
wifi: rtl8xxxu: gen2: Fix mistake in path B IQ calibration
wifi: rtl8xxxu: Remove copy-paste leftover in gen2_update_rate_mask
wifi: mt76: sdio: fix transmitting packet hangs
wifi: mt76: mt7615: add mt7615_mutex_acquire/release in mt7615_sta_set_decap_offload
wifi: mt76: mt7915: do not check state before configuring implicit beamform
Bluetooth: RFCOMM: Fix possible deadlock on socket shutdown/release
net: fs_enet: Fix wrong check in do_pd_setup
bpf: Ensure correct locking around vulnerable function find_vpid()
Bluetooth: hci_{ldisc,serdev}: check percpu_init_rwsem() failure
netfilter: conntrack: fix the gc rescheduling delay
netfilter: conntrack: revisit the gc initial rescheduling bias
wifi: ath11k: fix number of VHT beamformee spatial streams
x86/microcode/AMD: Track patch allocation size explicitly
x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
spi: dw: Fix PM disable depth imbalance in dw_spi_bt1_probe
spi/omap100k:Fix PM disable depth imbalance in omap1_spi100k_probe
skmsg: Schedule psock work if the cached skb exists on the psock
i2c: mlxbf: support lock mechanism
Bluetooth: hci_core: Fix not handling link timeouts propertly
xfrm: Reinject transport-mode packets through workqueue
netfilter: nft_fib: Fix for rpath check with VRF devices
spi: s3c64xx: Fix large transfers with DMA
wifi: rtl8xxxu: Fix AIFS written to REG_EDCA_*_PARAM
vhost/vsock: Use kvmalloc/kvfree for larger packets.
eth: alx: take rtnl_lock on resume
mISDN: fix use-after-free bugs in l1oip timer handlers
sctp: handle the error returned from sctp_auth_asoc_init_active_key
tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited
spi: Ensure that sg_table won't be used after being freed
hwmon: (pmbus/mp2888) Fix sensors readouts for MPS Multi-phase mp2888 controller
net: rds: don't hold sock lock when cancelling work from rds_tcp_reset_callbacks()
bnx2x: fix potential memory leak in bnx2x_tpa_stop()
net: wwan: iosm: Call mutex_init before locking it
net/ieee802154: reject zero-sized raw_sendmsg()
once: add DO_ONCE_SLOW() for sleepable contexts
net: mvpp2: fix mvpp2 debugfs leak
drm: bridge: adv7511: fix CEC power down control register offset
drm: bridge: adv7511: unregister cec i2c device after cec adapter
drm/bridge: Avoid uninitialized variable warning
drm/mipi-dsi: Detach devices when removing the host
drm/virtio: Correct drm_gem_shmem_get_sg_table() error handling
drm/bridge: parade-ps8640: Fix regulator supply order
drm/dp_mst: fix drm_dp_dpcd_read return value checks
drm:pl111: Add of_node_put() when breaking out of for_each_available_child_of_node()
ASoC: mt6359: fix tests for platform_get_irq() failure
platform/chrome: fix double-free in chromeos_laptop_prepare()
platform/chrome: fix memory corruption in ioctl
ASoC: tas2764: Allow mono streams
ASoC: tas2764: Drop conflicting set_bias_level power setting
ASoC: tas2764: Fix mute/unmute
platform/x86: msi-laptop: Fix old-ec check for backlight registering
platform/x86: msi-laptop: Fix resource cleanup
platform/chrome: cros_ec_typec: Correct alt mode index
drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()
drm/bridge: megachips: Fix a null pointer dereference bug
ASoC: rsnd: Add check for rsnd_mod_power_on
ALSA: hda: beep: Simplify keep-power-at-enable behavior
drm/bochs: fix blanking
drm/omap: dss: Fix refcount leak bugs
drm/amdgpu: Fix memory leak in hpd_rx_irq_create_workqueue()
mmc: au1xmmc: Fix an error handling path in au1xmmc_probe()
ASoC: eureka-tlv320: Hold reference returned from of_find_xxx API
drm/msm/dpu: index dpu_kms->hw_vbif using vbif_idx
drm/msm/dp: correct 1.62G link rate at dp_catalog_ctrl_config_msa()
drm/vmwgfx: Fix memory leak in vmw_mksstat_add_ioctl()
ASoC: codecs: tx-macro: fix kcontrol put
ASoC: da7219: Fix an error handling path in da7219_register_dai_clks()
ALSA: dmaengine: increment buffer pointer atomically
mmc: wmt-sdmmc: Fix an error handling path in wmt_mci_probe()
ASoC: wm8997: Fix PM disable depth imbalance in wm8997_probe
ASoC: wm5110: Fix PM disable depth imbalance in wm5110_probe
ASoC: wm5102: Fix PM disable depth imbalance in wm5102_probe
ASoC: mt6660: Fix PM disable depth imbalance in mt6660_i2c_probe
ALSA: hda/hdmi: Don't skip notification handling during PM operation
memory: pl353-smc: Fix refcount leak bug in pl353_smc_probe()
memory: of: Fix refcount leak bug in of_get_ddr_timings()
memory: of: Fix refcount leak bug in of_lpddr3_get_ddr_timings()
locks: fix TOCTOU race when granting write lease
soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
soc: qcom: smem_state: Add refcounting for the 'state->of_node'
ARM: dts: imx6qdl-kontron-samx6i: hook up DDC i2c bus
ARM: dts: turris-omnia: Fix mpp26 pin name and comment
ARM: dts: kirkwood: lsxl: fix serial line
ARM: dts: kirkwood: lsxl: remove first ethernet port
ia64: export memory_add_physaddr_to_nid to fix cxl build error
soc/tegra: fuse: Drop Kconfig dependency on TEGRA20_APB_DMA
arm64: dts: ti: k3-j7200: fix main pinmux range
ARM: dts: exynos: correct s5k6a3 reset polarity on Midas family
ARM: Drop CMDLINE_* dependency on ATAGS
ext4: don't run ext4lazyinit for read-only filesystems
arm64: ftrace: fix module PLTs with mcount
ARM: dts: exynos: fix polarity of VBUS GPIO of Origen
iio: adc: at91-sama5d2_adc: fix AT91_SAMA5D2_MR_TRACKTIM_MAX
iio: adc: at91-sama5d2_adc: check return status for pressure and touch
iio: adc: at91-sama5d2_adc: lock around oversampling and sample freq
iio: adc: at91-sama5d2_adc: disable/prepare buffer on suspend/resume
iio: inkern: only release the device node when done with it
iio: inkern: fix return value in devm_of_iio_channel_get_by_name()
iio: ABI: Fix wrong format of differential capacitance channel ABI.
iio: magnetometer: yas530: Change data type of hard_offsets to signed
RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkey
usb: common: debug: Check non-standard control requests
clk: meson: Hold reference returned by of_get_parent()
clk: oxnas: Hold reference returned by of_get_parent()
clk: qoriq: Hold reference returned by of_get_parent()
clk: berlin: Add of_node_put() for of_get_parent()
clk: sprd: Hold reference returned by of_get_parent()
clk: tegra: Fix refcount leak in tegra210_clock_init
clk: tegra: Fix refcount leak in tegra114_clock_init
clk: tegra20: Fix refcount leak in tegra20_clock_init
HSI: omap_ssi: Fix refcount leak in ssi_probe
HSI: omap_ssi_port: Fix dma_map_sg error check
media: exynos4-is: fimc-is: Add of_node_put() when breaking out of loop
tty: xilinx_uartps: Fix the ignore_status
media: meson: vdec: add missing clk_disable_unprepare on error in vdec_hevc_start()
media: uvcvideo: Fix memory leak in uvc_gpio_parse
media: uvcvideo: Use entity get_cur in uvc_ctrl_set
media: xilinx: vipp: Fix refcount leak in xvip_graph_dma_init
RDMA/rxe: Fix "kernel NULL pointer dereference" error
RDMA/rxe: Fix the error caused by qp->sk
misc: ocxl: fix possible refcount leak in afu_ioctl()
fpga: prevent integer overflow in dfl_feature_ioctl_set_irq()
dmaengine: hisilicon: Disable channels when unregister hisi_dma
dmaengine: hisilicon: Fix CQ head update
dmaengine: hisilicon: Add multi-thread support for a DMA channel
dyndbg: fix static_branch manipulation
dyndbg: fix module.dyndbg handling
dyndbg: let query-modname override actual module name
dyndbg: drop EXPORTed dynamic_debug_exec_queries
clk: qcom: sm6115: Select QCOM_GDSC
mtd: devices: docg3: check the return value of devm_ioremap() in the probe
phy: amlogic: phy-meson-axg-mipi-pcie-analog: Hold reference returned by of_get_parent()
phy: phy-mtk-tphy: fix the phy type setting issue
mtd: rawnand: intel: Read the chip-select line from the correct OF node
mtd: rawnand: intel: Remove undocumented compatible string
mtd: rawnand: fsl_elbc: Fix none ECC mode
RDMA/irdma: Align AE id codes to correct flush code and event
RDMA/srp: Fix srp_abort()
RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.
RDMA/siw: Fix QP destroy to wait for all references dropped.
ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting()
ata: fix ata_id_has_devslp()
ata: fix ata_id_has_ncq_autosense()
ata: fix ata_id_has_dipm()
mtd: rawnand: meson: fix bit map use in meson_nfc_ecc_correct()
md: Replace snprintf with scnprintf
md/raid5: Ensure stripe_fill happens on non-read IO with journal
md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
RDMA/cm: Use SLID in the work completion as the DLID in responder side
IB: Set IOVA/LENGTH on IB_MR in core/uverbs layers
xhci: Don't show warning for reinit on known broken suspend
usb: gadget: function: fix dangling pnp_string in f_printer.c
drivers: serial: jsm: fix some leaks in probe
serial: 8250: Toggle IER bits on only after irq has been set up
tty: serial: fsl_lpuart: disable dma rx/tx use flags in lpuart_dma_shutdown
phy: qualcomm: call clk_disable_unprepare in the error handling
staging: vt6655: fix some erroneous memory clean-up loops
slimbus: qcom-ngd-ctrl: allow compile testing without QCOM_RPROC_COMMON
firmware: google: Test spinlock on panic path to avoid lockups
serial: 8250: Fix restoring termios speed after suspend
scsi: libsas: Fix use-after-free bug in smp_execute_task_sg()
scsi: iscsi: Rename iscsi_conn_queue_work()
scsi: iscsi: Add recv workqueue helpers
scsi: iscsi: Run recv path from workqueue
scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
clk: qcom: apss-ipq6018: mark apcs_alias0_core_clk as critical
clk: qcom: gcc-sm6115: Override default Alpha PLL regs
RDMA/rxe: Fix resize_finish() in rxe_queue.c
fsi: core: Check error number after calling ida_simple_get
mfd: intel_soc_pmic: Fix an error handling path in intel_soc_pmic_i2c_probe()
mfd: fsl-imx25: Fix an error handling path in mx25_tsadc_setup_irq()
mfd: lp8788: Fix an error handling path in lp8788_probe()
mfd: lp8788: Fix an error handling path in lp8788_irq_init() and lp8788_irq_init()
mfd: fsl-imx25: Fix check for platform_get_irq() errors
mfd: sm501: Add check for platform_driver_register()
clk: mediatek: mt8183: mfgcfg: Propagate rate changes to parent
dmaengine: ioat: stop mod_timer from resurrecting deleted timer in __cleanup()
usb: mtu3: fix failed runtime suspend in host only mode
spmi: pmic-arb: correct duplicate APID to PPID mapping logic
clk: vc5: Fix 5P49V6901 outputs disabling when enabling FOD
clk: baikal-t1: Fix invalid xGMAC PTP clock divider
clk: baikal-t1: Add shared xGMAC ref/ptp clocks internal parent
clk: baikal-t1: Add SATA internal ref clock buffer
clk: bcm2835: fix bcm2835_clock_rate_from_divisor declaration
clk: imx: scu: fix memleak on platform_device_add() fails
clk: ti: dra7-atl: Fix reference leak in of_dra7_atl_clk_probe
clk: ast2600: BCLK comes from EPLL
mailbox: mpfs: fix handling of the reg property
mailbox: mpfs: account for mbox offsets while sending
mailbox: bcm-ferxrm-mailbox: Fix error check for dma_map_sg
powerpc/configs: Properly enable PAPR_SCM in pseries_defconfig
powerpc/math_emu/efp: Include module.h
powerpc/sysdev/fsl_msi: Add missing of_node_put()
powerpc/pci_dn: Add missing of_node_put()
powerpc/powernv: add missing of_node_put() in opal_export_attrs()
powerpc: Fix fallocate and fadvise64_64 compat parameter combination
x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition
powerpc/64s: Fix GENERIC_CPU build flags for PPC970 / G5
powerpc: Fix SPE Power ISA properties for e500v1 platforms
powerpc/kprobes: Fix null pointer reference in arch_prepare_kprobe()
powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALL
crypto: sahara - don't sleep when in softirq
crypto: hisilicon/zip - fix mismatch in get/set sgl_sge_nr
hwrng: arm-smccc-trng - fix NO_ENTROPY handling
cgroup: Honor caller's cgroup NS when resolving path
hwrng: imx-rngc - Moving IRQ handler registering after imx_rngc_irq_mask_clear()
crypto: qat - fix default value of WDT timer
crypto: hisilicon/qm - fix missing put dfx access
cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset
iommu/omap: Fix buffer overflow in debugfs
crypto: akcipher - default implementation for setting a private key
crypto: ccp - Release dma channels before dmaengine unrgister
crypto: inside-secure - Change swab to swab32
crypto: qat - fix DMA transfer direction
cifs: return correct error in ->calc_signature()
iommu/iova: Fix module config properly
tracing: kprobe: Fix kprobe event gen test module on exit
tracing: kprobe: Make gen test module work in arm and riscv
tracing/osnoise: Fix possible recursive locking in stop_per_cpu_kthreads
kbuild: remove the target in signal traps when interrupted
kbuild: rpm-pkg: fix breakage when V=1 is used
crypto: marvell/octeontx - prevent integer overflows
crypto: cavium - prevent integer overflow loading firmware
thermal/drivers/qcom/tsens-v0_1: Fix MSM8939 fourth sensor hw_id
ACPI: APEI: do not add task_work to kernel thread to avoid memory leak
f2fs: fix race condition on setting FI_NO_EXTENT flag
f2fs: fix to account FS_CP_DATA_IO correctly
selftest: tpm2: Add Client.__del__() to close /dev/tpm* handle
fs: dlm: fix race in lowcomms
rcu: Avoid triggering strict-GP irq-work when RCU is idle
rcu: Back off upon fill_page_cache_func() allocation failure
rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
ACPI: video: Add Toshiba Satellite/Portege Z830 quirk
ACPI: tables: FPDT: Don't call acpi_os_map_memory() on invalid phys address
cpufreq: intel_pstate: Add Tigerlake support in no-HWP mode
MIPS: BCM47XX: Cast memcmp() of function to (void *)
powercap: intel_rapl: fix UBSAN shift-out-of-bounds issue
thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
ARM: decompressor: Include .data.rel.ro.local
ACPI: x86: Add a quirk for Dell Inspiron 14 2-in-1 for StorageD3Enable
x86/entry: Work around Clang __bdos() bug
NFSD: Return nfserr_serverfault if splice_ok but buf->pages have data
NFSD: fix use-after-free on source server when doing inter-server copy
wifi: brcmfmac: fix invalid address access when enabling SCAN log level
bpftool: Clear errno after libcap's checks
ice: set tx_tstamps when creating new Tx rings via ethtool
net: ethernet: ti: davinci_mdio: Add workaround for errata i2329
openvswitch: Fix double reporting of drops in dropwatch
openvswitch: Fix overreporting of drops in dropwatch
tcp: annotate data-race around tcp_md5sig_pool_populated
x86/mce: Retrieve poison range from hardware
wifi: ath9k: avoid uninit memory read in ath9k_htc_rx_msg()
thunderbolt: Add back Intel Falcon Ridge end-to-end flow control workaround
xfrm: Update ipcomp_scratches with NULL when freed
iavf: Fix race between iavf_close and iavf_reset_task
wifi: brcmfmac: fix use-after-free bug in brcmf_netdev_start_xmit()
Bluetooth: btintel: Mark Intel controller to support LE_STATES quirk
regulator: core: Prevent integer underflow
wifi: mt76: mt7921: reset msta->airtime_ac while clearing up hw value
Bluetooth: L2CAP: initialize delayed works at l2cap_chan_create()
Bluetooth: hci_sysfs: Fix attempting to call device_add multiple times
can: bcm: check the result of can_send() in bcm_can_tx()
wifi: rt2x00: don't run Rt5592 IQ calibration on MT7620
wifi: rt2x00: set correct TX_SW_CFG1 MAC register for MT7620
wifi: rt2x00: set VGC gain for both chains of MT7620
wifi: rt2x00: set SoC wmac clock register
wifi: rt2x00: correctly set BBP register 86 for MT7620
hwmon: (sht4x) do not overflow clamping operation on 32-bit platforms
net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory
Bluetooth: L2CAP: Fix user-after-free
r8152: Rate limit overflow messages
drm/nouveau/nouveau_bo: fix potential memory leak in nouveau_bo_alloc()
drm: Use size_t type for len variable in drm_copy_field()
drm: Prevent drm_copy_field() to attempt copying a NULL pointer
drm/komeda: Fix handling of atomic commits in the atomic_commit_tail hook
gpu: lontium-lt9611: Fix NULL pointer dereference in lt9611_connector_init()
drm/amd/display: fix overflow on MIN_I64 definition
udmabuf: Set ubuf->sg = NULL if the creation of sg table fails
drm: bridge: dw_hdmi: only trigger hotplug event on link change
ALSA: usb-audio: Register card at the last interface
drm/vc4: vec: Fix timings for VEC modes
drm: panel-orientation-quirks: Add quirk for Anbernic Win600
platform/chrome: cros_ec: Notify the PM of wake events during resume
platform/x86: msi-laptop: Change DMI match / alias strings to fix module autoloading
ASoC: SOF: pci: Change DMI match info to support all Chrome platforms
drm/amdgpu: fix initial connector audio value
drm/meson: reorder driver deinit sequence to fix use-after-free bug
drm/meson: explicitly remove aggregate driver at module unload time
mmc: sdhci-msm: add compatible string check for sdm670
drm/dp: Don't rewrite link config when setting phy test pattern
drm/amd/display: Remove interface for periodic interrupt 1
ARM: dts: imx7d-sdb: config the max pressure for tsc2046
ARM: dts: imx6q: add missing properties for sram
ARM: dts: imx6dl: add missing properties for sram
ARM: dts: imx6qp: add missing properties for sram
ARM: dts: imx6sl: add missing properties for sram
ARM: dts: imx6sll: add missing properties for sram
ARM: dts: imx6sx: add missing properties for sram
kselftest/arm64: Fix validatation termination record after EXTRA_CONTEXT
arm64: dts: imx8mq-librem5: Add bq25895 as max17055's power supply
btrfs: dump extra info if one free space cache has more bitmaps than it should
btrfs: scrub: try to fix super block errors
btrfs: don't print information about space cache or tree every remount
ARM: 9242/1: kasan: Only map modules if CONFIG_KASAN_VMALLOC=n
clk: zynqmp: Fix stack-out-of-bounds in strncpy`
media: cx88: Fix a null-ptr-deref bug in buffer_prepare()
media: platform: fix some double free in meson-ge2d and mtk-jpeg and s5p-mfc
clk: zynqmp: pll: rectify rate rounding in zynqmp_pll_round_rate
usb: host: xhci-plat: suspend and resume clocks
usb: host: xhci-plat: suspend/resume clks for brcm
dmaengine: ti: k3-udma: Reset UDMA_CHAN_RT byte counters to prevent overflow
scsi: 3w-9xxx: Avoid disabling device if failing to enable it
nbd: Fix hung when signal interrupts nbd_start_device_ioctl()
iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity
power: supply: adp5061: fix out-of-bounds read in adp5061_get_chg_type()
staging: vt6655: fix potential memory leak
blk-throttle: prevent overflow while calculating wait time
ata: libahci_platform: Sanity check the DT child nodes number
bcache: fix set_at_max_writeback_rate() for multiple attached devices
soundwire: cadence: Don't overwrite msg->buf during write commands
soundwire: intel: fix error handling on dai registration issues
HID: roccat: Fix use-after-free in roccat_read()
eventfd: guard wake_up in eventfd fs calls as well
md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
usb: host: xhci: Fix potential memory leak in xhci_alloc_stream_info()
usb: musb: Fix musb_gadget.c rxstate overflow bug
arm64: dts: imx8mp: Add snps,gfladj-refclk-lpm-sel quirk to USB nodes
usb: dwc3: core: Enable GUCTL1 bit 10 for fixing termination error after resume bug
Revert "usb: storage: Add quirk for Samsung Fit flash"
staging: rtl8723bs: fix potential memory leak in rtw_init_drv_sw()
staging: rtl8723bs: fix a potential memory leak in rtw_init_cmd_priv()
scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled
ext2: Use kvmalloc() for group descriptor array
nvme: copy firmware_rev on each init
nvmet-tcp: add bounds check on Transfer Tag
usb: idmouse: fix an uninit-value in idmouse_open
clk: bcm2835: Make peripheral PLLC critical
clk: bcm2835: Round UART input clock up
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
io_uring/af_unix: defer registered files gc to io_uring release
io_uring: correct pinned_vm accounting
io_uring/rw: fix short rw error handling
io_uring/rw: fix error'ed retry return values
io_uring/rw: fix unexpected link breakage
mm: hugetlb: fix UAF in hugetlb_handle_userfault
net: ieee802154: return -EINVAL for unknown addr type
ALSA: usb-audio: Fix last interface check for registration
blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init()
net: ethernet: ti: davinci_mdio: fix build for mdio bitbang uses
Revert "net/ieee802154: reject zero-sized raw_sendmsg()"
net/ieee802154: don't warn zero-sized raw_sendmsg()
drm/amd/display: Fix build breakage with CONFIG_DEBUG_FS=n
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
ext4: continue to expand file system when the target size doesn't reach
thermal: intel_powerclamp: Use first online CPU as control_cpu
gcov: support GCC 12.1 and newer compilers
io-wq: Fix memory leak in worker creation
Linux 5.15.75
Change-Id: I5a3ef9688fb31003940d7e1828f863b9d50f1da9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit d2d05b88035d2d51a5bb6c5afec88a0880c73df4 ]
Inside set_at_max_writeback_rate() the calculation in following if()
check is wrong,
if (atomic_inc_return(&c->idle_counter) <
atomic_read(&c->attached_dev_nr) * 6)
Because each attached backing device has its own writeback thread
running and increasing c->idle_counter, the counter increates much
faster than expected. The correct calculation should be,
(counter / dev_nr) < dev_nr * 6
which equals to,
counter < dev_nr * dev_nr * 6
This patch fixes the above mistake with correct calculation, and helper
routine idle_counter_exceeded() is added to make code be more clear.
Reported-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Acked-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Link: https://lore.kernel.org/r/20220919161647.81238-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The header file include/uapi/linux/bcache.h is not really a user space
API heaer. This file defines the ondisk format of bcache internal meta
data but no one includes it from user space, bcache-tools has its own
copy of this header with minor modification.
Therefore, this patch moves include/uapi/linux/bcache.h to bcache code
directory as drivers/md/bcache/bcache_ondisk.h.
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit cf2197ca4b8c199d188593ca6800ea1827c42171)
Bug: 183899269
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Idb88d748a0cd11708a07ae7ad21ed76ede14b3c3
commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c upstream.
The local variables check_state (in bch_btree_check()) and state (in
bch_sectors_dirty_init()) should be fully filled by 0, because before
allocating them on stack, they were dynamically allocated by kzalloc().
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 32feee36c30ea06e38ccb8ae6e5c44c6eec790a6 upstream.
The journal no-space deadlock was reported time to time. Such deadlock
can happen in the following situation.
When all journal buckets are fully filled by active jset with heavy
write I/O load, the cache set registration (after a reboot) will load
all active jsets and inserting them into the btree again (which is
called journal replay). If a journaled bkey is inserted into a btree
node and results btree node split, new journal request might be
triggered. For example, the btree grows one more level after the node
split, then the root node record in cache device super block will be
upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
space in journal buckets, the journal replay has to wait for new journal
bucket to be reclaimed after at least one journal bucket replayed. This
is one example that how the journal no-space deadlock happens.
The solution to avoid the deadlock is to reserve 1 journal bucket in
run time, and only permit the reserved journal bucket to be used during
cache set registration procedure for things like journal replay. Then
the journal space will never be fully filled, there is no chance for
journal no-space deadlock to happen anymore.
This patch adds a new member "bool do_reserve" in struct journal, it is
inititalized to 0 (false) when struct journal is allocated, and set to
1 (true) by bch_journal_space_reserve() when all initialization done in
run_cache_set(). In the run time when journal_reclaim() tries to
allocate a new journal bucket, free_journal_buckets() is called to check
whether there are enough free journal buckets to use. If there is only
1 free journal bucket and journal->do_reserve is 1 (true), the last
bucket is reserved and free_journal_buckets() will return 0 to indicate
no free journal bucket. Then journal_reclaim() will give up, and try
next time to see whetheer there is free journal bucket to allocate. By
this method, there is always 1 jouranl bucket reserved in run time.
During the cache set registration, journal->do_reserve is 0 (false), so
the reserved journal bucket can be used to avoid the no-space deadlock.
Reported-by: Nikhil Kshirsagar <nkshirsagar@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 80db4e4707e78cb22287da7d058d7274bd4cb370 upstream.
After making bch_sectors_dirty_init() being multithreaded, the existing
incremental dirty sector counting in bch_root_node_dirty_init() doesn't
release btree occupation after iterating 500000 (INIT_KEYS_EACH_TIME)
bkeys. Because a read lock is added on btree root node to prevent the
btree to be split during the dirty sectors counting, other I/O requester
has no chance to gain the write lock even restart bcache_btree().
That is to say, the incremental dirty sectors counting is incompatible
to the multhreaded bch_sectors_dirty_init(). We have to choose one and
drop another one.
In my testing, with 512 bytes random writes, I generate 1.2T dirty data
and a btree with 400K nodes. With single thread and incremental dirty
sectors counting, it takes 30+ minites to register the backing device.
And with multithreaded dirty sectors counting, the backing device
registration can be accomplished within 2 minutes.
The 30+ minutes V.S. 2- minutes difference makes me decide to keep
multithreaded bch_sectors_dirty_init() and drop the incremental dirty
sectors counting. This is what this patch does.
But INIT_KEYS_EACH_TIME is kept, in sectors_dirty_init_fn() the CPU
will be released by cond_resched() after every INIT_KEYS_EACH_TIME keys
iterated. This is to avoid the watchdog reports a bogus soft lockup
warning.
Fixes: b144e45fc5 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4dc34ae1b45fe26e772a44379f936c72623dd407 upstream.
Commit b144e45fc5 ("bcache: make bch_sectors_dirty_init() to be
multithreaded") makes bch_sectors_dirty_init() to be much faster
when counting dirty sectors by iterating all dirty keys in the btree.
But it isn't in ideal shape yet, still can be improved.
This patch does the following changes to improve current parallel dirty
keys iteration on the btree,
- Add read lock to root node when multiple threads iterating the btree,
to prevent the root node gets split by I/Os from other registered
bcache devices.
- Remove local variable "char name[32]" and generate kernel thread name
string directly when calling kthread_run().
- Allocate "struct bch_dirty_init_state state" directly on stack and
avoid the unnecessary dynamic memory allocation for it.
- Decrease BCH_DIRTY_INIT_THRD_MAX from 64 to 12 which is enough indeed.
- Increase &state->started to count created kernel thread after it
succeeds to create.
- When wait for all dirty key counting threads to finish, use
wait_event() to replace wait_event_interruptible().
With the above changes, the code is more clear, and some potential error
conditions are avoided.
Fixes: b144e45fc5 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 622536443b6731ec82c563aae7807165adbe9178 upstream.
Commit 8e7102273f ("bcache: make bch_btree_check() to be
multithreaded") makes bch_btree_check() to be much faster when checking
all btree nodes during cache device registration. But it isn't in ideal
shap yet, still can be improved.
This patch does the following thing to improve current parallel btree
nodes check by multiple threads in bch_btree_check(),
- Add read lock to root node while checking all the btree nodes with
multiple threads. Although currently it is not mandatory but it is
good to have a read lock in code logic.
- Remove local variable 'char name[32]', and generate kernel thread name
string directly when calling kthread_run().
- Allocate local variable "struct btree_check_state check_state" on the
stack and avoid unnecessary dynamic memory allocation for it.
- Reduce BCH_BTR_CHKTHREAD_MAX from 64 to 12 which is enough indeed.
- Increase check_state->started to count created kernel thread after it
succeeds to create.
- When wait for all checking kernel threads to finish, use wait_event()
to replace wait_event_interruptible().
With this change, the code is more clear, and some potential error
conditions are avoided.
Fixes: 8e7102273f ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 887554ab96588de2917b6c8c73e552da082e5368 upstream.
When multiple threads to check btree nodes in parallel, the main
thread wait for all threads to stop or CACHE_SET_IO_DISABLE flag:
wait_event_interruptible(check_state->wait,
atomic_read(&check_state->started) == 0 ||
test_bit(CACHE_SET_IO_DISABLE, &c->flags));
However, the bch_btree_node_read and bch_btree_node_read_done
maybe call bch_cache_set_error, then the CACHE_SET_IO_DISABLE
will be set. If the flag already set, the main thread return
error. At the same time, maybe some threads still running and
read NULL pointer, the kernel will crash.
This patch change the event wait condition, the main thread must
wait for all threads to stop.
Fixes: 8e7102273f ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2878feaed543c35f9dbbe6d8ce36fb67ac803eef upstream.
This reverts commit 2fd3e5efe7.
The above commit replaces page_address(bv->bv_page) by bvec_virt(bv) to
avoid directly access to bv->bv_page, but in situation bv->bv_offset is
not zero and page_address(bv->bv_page) is not equal to bvec_virt(bv). In
such case a memory corruption may happen because memory in next page is
tainted by following line in do_btree_node_write(),
memcpy(bvec_virt(bv), addr, PAGE_SIZE);
This patch reverts the mentioned commit to avoid the memory corruption.
Fixes: 2fd3e5efe7 ("bcache: use bvec_virt")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211103151041.70516-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8468f45091d2866affed6f6a7aecc20779139173 upstream.
In bcache_device_free(), pointer disk is referenced still in
ida_simple_remove() after blk_cleanup_disk() gets called on this
pointer. This may cause a potential panic by use-after-free on the
disk pointer.
This patch fixes the problem by calling blk_cleanup_disk() after
ida_simple_remove().
Fixes: bc70852fd1 ("bcache: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: stable@vger.kernel.org # v5.14+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211103064917.67383-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.
The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa41
("block: restore multiple bd_link_disk_holder() support").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block fixes from Jens Axboe:
"A few fixes that should go into 5.13:
- Fix a regression deadlock introduced in this release between open
and remove of a bdev (Christoph)
- Fix an async_xor md regression in this release (Xiao)
- Fix bcache oversized read issue (Coly)"
* tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block:
block: loop: fix deadlock between open and remove
async_xor: check src_offs is not NULL before updating it
bcache: avoid oversized read request in cache missing code path
bcache: remove bcache device self-defined readahead
In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
526 unsigned int sectors = KEY_INODE(k) == s->iop.inode
527 ? min_t(uint64_t, INT_MAX,
528 KEY_START(k) - bio->bi_iter.bi_sector)
529 : INT_MAX;
530 int ret = s->d->cache_miss(b, s, bio, sectors);
Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.
Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,
1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
886 BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
51 default:
52 BUG();
53 return NULL;
All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.
Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
909 s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);
Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
911 s->iop.replace_key = KEY(s->iop.inode,
912 bio->bi_iter.bi_sector + s->insert_bio_sectors,
913 s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.
And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
926 cache_bio = bio_alloc_bioset(GFP_NOWAIT,
927 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
928 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.
Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
71 #define KEY(inode, offset, size) \
72 ((struct bkey) { \
73 .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode), \
74 .low = (offset) \
75 })
Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
915 ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);
Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).
Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.
From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().
From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.
Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit, set it by the minimum value from
the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.
By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.
Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.
Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For read cache missing, bcache defines a readahead size for the read I/O
request to the backing device for the missing data. This readahead size
is initialized to 0, and almost no one uses it to avoid unnecessary read
amplifying onto backing device and write amplifying onto cache device.
Considering upper layer file system code has readahead logic allready
and works fine with readahead_cache_policy sysfile interface, we don't
have to keep bcache self-defined readahead anymore.
This patch removes the bcache self-defined readahead for cache missing
request for backing device, and the readahead sysfs file interfaces are
removed as well.
This is the preparation for next patch to fix potential kernel panic due
to oversized request in a simpler method.
Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The patch "bcache: remove PTR_CACHE" introduces a compiling failure in
debug.c with following error message,
In file included from drivers/md/bcache/bcache.h:182:0,
from drivers/md/bcache/debug.c:9:
drivers/md/bcache/debug.c: In function 'bch_btree_verify':
drivers/md/bcache/debug.c:53:19: error: 'c' undeclared (first use in
this function)
bio_set_dev(bio, c->cache->bdev);
^
This patch fixes the regression by replacing c->cache->bdev by b->c->
cache->bdev.
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210411134316.80274-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Cast multiple variables to (int64_t) in order to give the compiler
complete information about the proper arithmetic to use. Notice that
these variables are being used in contexts that expect expressions of
type int64_t (64 bit, signed). And currently, such expressions are
being evaluated using 32-bit arithmetic.
Fixes: d0cf9503e9 ("octeontx2-pf: ethtool fec mode support")
Addresses-Coverity-ID: 1501724 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501725 ("Unintentional integer overflow")
Addresses-Coverity-ID: 1501726 ("Unintentional integer overflow")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
building with 'make W=1' shows a harmless warning for each user of the
EBUG_ON() macro:
drivers/md/bcache/bset.c: In function 'bch_btree_sort_partial':
drivers/md/bcache/util.h:30:55: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
30 | #define EBUG_ON(cond) do { if (cond); } while (0)
| ^
drivers/md/bcache/bset.c:1312:9: note: in expansion of macro 'EBUG_ON'
1312 | EBUG_ON(oldsize >= 0 && bch_count_data(b) != oldsize);
| ^~~~~~~
Reword the macro slightly to avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In bch_cached_dev_run(), free(env[1])|free(env[2])|free(buf)
show up three times. This patch introduce out tag in
which free(env[1])|free(env[2])|free(buf) are only called
one time. If we need to call free() when errors occur,
we can set error code to ret, and then goto out tag directly.
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210411134316.80274-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block driver updates from Jens Axboe:
- Remove the skd driver. It's been EOL for a long time (Damien)
- NVMe pull requests
- fix multipath handling of ->queue_rq errors (Chao Leng)
- nvmet cleanups (Chaitanya Kulkarni)
- add a quirk for buggy Amazon controller (Filippo Sironi)
- avoid devm allocations in nvme-hwmon that don't interact well
with fabrics (Hannes Reinecke)
- sysfs cleanups (Jiapeng Chong)
- fix nr_zones for multipath (Keith Busch)
- nvme-tcp crash fix for no-data commands (Sagi Grimberg)
- nvmet-tcp fixes (Sagi Grimberg)
- add a missing __rcu annotation (Christoph)
- failed reconnect fixes (Chao Leng)
- various tracing improvements (Michal Krakowiak, Johannes
Thumshirn)
- switch the nvmet-fc assoc_list to use RCU protection (Leonid
Ravich)
- resync the status codes with the latest spec (Max Gurtovoy)
- minor nvme-tcp improvements (Sagi Grimberg)
- various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya
Kulkarni, Israel Rukshin)
- Floppy O_NDELAY fix (Denis)
- MD pull request
- raid5 chunk_sectors fix (Guoqing)
- Use lore links (Kees)
- Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao)
- loop lock scaling (Pavel)
- mtip32xx PCI fixes (Bjorn)
- bcache fixes (Kai, Dongdong)
- Misc fixes (Tian, Yang, Guoqing, Joe, Andy)
* tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits)
lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid()
lightnvm: fix unnecessary NULL check warnings
nvme-tcp: fix crash triggered with a dataless request submission
block: Replace lkml.org links with lore
nbd: Convert to DEFINE_SHOW_ATTRIBUTE
nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
nvme-hwmon: rework to avoid devm allocation
nvmet: remove else at the end of the function
nvmet: add nvmet_req_subsys() helper
nvmet: use min of device_path and disk len
nvmet: use invalid cmd opcode helper
nvmet: use invalid cmd opcode helper
nvmet: add helper to report invalid opcode
nvmet: remove extra variable in id-ns handler
nvmet: make nvmet_find_namespace() req based
nvmet: return uniform error for invalid ns
nvmet: set status to 0 in case for invalid nsid
nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
nvme-multipath: set nr_zones for zoned namespaces
nvmet-tcp: fix potential race of tcp socket closing accept_work
...
Pull core block updates from Jens Axboe:
"Another nice round of removing more code than what is added, mostly
due to Christoph's relentless pursuit of tech debt removal/cleanups.
This pull request contains:
- Two series of BFQ improvements (Paolo, Jan, Jia)
- Block iov_iter improvements (Pavel)
- bsg error path fix (Pan)
- blk-mq scheduler improvements (Jan)
- -EBUSY discard fix (Jan)
- bvec allocation improvements (Ming, Christoph)
- bio allocation and init improvements (Christoph)
- Store bdev pointer in bio instead of gendisk + partno (Christoph)
- Block trace point cleanups (Christoph)
- hard read-only vs read-only split (Christoph)
- Block based swap cleanups (Christoph)
- Zoned write granularity support (Damien)
- Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"
* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
mm: simplify swapdev_block
sd_zbc: clear zone resources for non-zoned case
block: introduce blk_queue_clear_zone_settings()
zonefs: use zone write granularity as block size
block: introduce zone_write_granularity limit
block: use blk_queue_set_zoned in add_partition()
nullb: use blk_queue_set_zoned() to setup zoned devices
nvme: cleanup zone information initialization
block: document zone_append_max_bytes attribute
block: use bi_max_vecs to find the bvec pool
md/raid10: remove dead code in reshape_request
block: mark the bio as cloned in bio_iov_bvec_set
block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
block: remove a layer of indentation in bio_iov_iter_get_pages
block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
block: remove the 1 and 4 vec bvec_slabs entries
block: streamline bvec_alloc
block: factor out a bvec_alloc_gfp helper
block: move struct biovec_slab to bio.c
block: reuse BIO_INLINE_VECS for integrity bvecs
...
This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.
As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.
Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:
> // workqueue.c:
> system_wq = alloc_workqueue("events", 0, 0);
Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.
Let's combine both worlds and make it multi threaded but able to
reclaim memory.
Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This reverts commit 56b30770b2.
With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.
After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.
So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.
Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Current way to calculate the writeback rate only considered the
dirty sectors, this usually works fine when the fragmentation
is not high, but it will give us unreasonable small rate when
we are under a situation that very few dirty sectors consumed
a lot dirty buckets. In some case, the dirty bucekts can reached
to CUTOFF_WRITEBACK_SYNC while the dirty data(sectors) not even
reached the writeback_percent, the writeback rate will still
be the minimum value (4k), thus it will cause all the writes to be
stucked in a non-writeback mode because of the slow writeback.
We accelerate the rate in 3 stages with different aggressiveness,
the first stage starts when dirty buckets percent reach above
BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), the second is
BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57), the third is
BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). By default
the first stage tries to writeback the amount of dirty data
in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) second,
the second stage tries to writeback the amount of dirty data in one bucket
in (1 / (dirty_buckets_percent - 57)) * 100 millisecond, the third
stage tries to writeback the amount of dirty data in one bucket in
(1 / (dirty_buckets_percent - 64)) millisecond.
the initial rate at each stage can be controlled by 3 configurable
parameters writeback_rate_fp_term_{low|mid|high}, they are by default
1, 10, 1000, the hint of IO throughput that these values are trying
to achieve is described by above paragraph, the reason that
I choose those value as default is based on the testing and the
production data, below is some details:
A. When it comes to the low stage, there is still a bit far from the 70
threshold, so we only want to give it a little bit push by setting the
term to 1, it means the initial rate will be 170 if the fragment is 6,
it is calculated by bucket_size/fragment, this rate is very small,
but still much reasonable than the minimum 8.
For a production bcache with unheavy workload, if the cache device
is bigger than 1 TB, it may take hours to consume 1% buckets,
so it is very possible to reclaim enough dirty buckets in this stage,
thus to avoid entering the next stage.
B. If the dirty buckets ratio didn't turn around during the first stage,
it comes to the mid stage, then it is necessary for mid stage
to be more aggressive than low stage, so i choose the initial rate
to be 10 times more than low stage, that means 1700 as the initial
rate if the fragment is 6. This is some normal rate
we usually see for a normal workload when writeback happens
because of writeback_percent.
C. If the dirty buckets ratio didn't turn around during the low and mid
stages, it comes to the third stage, and it is the last chance that
we can turn around to avoid the horrible cutoff writeback sync issue,
then we choose 100 times more aggressive than the mid stage, that
means 170000 as the initial rate if the fragment is 6. This is also
inferred from a production bcache, I've got one week's writeback rate
data from a production bcache which has quite heavy workloads,
again, the writeback is triggered by the writeback percent,
the highest rate area is around 100000 to 240000, so I believe this
kind aggressiveness at this stage is reasonable for production.
And it should be mostly enough because the hint is trying to reclaim
1000 bucket per second, and from that heavy production env,
it is consuming 50 bucket per second on average in one week's data.
Option writeback_consider_fragment is to control whether we want
this feature to be on or off, it's on by default.
Lastly, below is the performance data for all the testing result,
including the data from production env:
https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing
Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For super block version < BCACHE_SB_VERSION_CDEV_WITH_FEATURES, it
doesn't make sense to check the feature sets. This patch checks
super block version in bch_has_feature_* routines, if the version
doesn't have feature sets yet, returns 0 (false) to the caller.
Fixes: 5342fd4255 ("bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Cc: stable@vger.kernel.org # 5.9+
Reported-and-tested-by: Bockholdt Arne <a.bockholdt@precitec-optronik.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Always use the bio_set_dev helper to assign ->bi_bdev to make sure
other state related to the device is uptodate.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rework the I/O accounting for bio based drivers to use ->bi_bdev. This
means all drivers can now simply use bio_start_io_acct to start
accounting, and it will take partitions into account automatically. To
end I/O account either bio_end_io_acct can be used if the driver never
remaps I/O to a different device, or bio_end_io_acct_remapped if the
driver did remap the I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.
For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.
This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
the attached bcache device will be read-only, and ask users to update
to latest bcache-tools.
Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds the check for features which is incompatible for
current supported feature sets.
Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".
Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP
Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need to reassign pdev_set_uuid in the second loop iteration,
so move it to the place before second loop.
Signed-off-by: Yi Li <yili@winhong.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>