Changes in 4.14.276
USB: serial: pl2303: add IBM device IDs
USB: serial: simple: add Nokia phone driver
netdevice: add the case if dev is NULL
virtio_console: break out of buf poll on remove
ethernet: sun: Free the coherent when failing in probing
spi: Fix invalid sgs value
spi: Fix erroneous sgs value with min_t()
af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register
fuse: fix pipe buffer lifetime for direct_io
tpm: fix reference counting for struct tpm_chip
block: Add a helper to validate the block size
virtio-blk: Use blk_validate_block_size() to validate block size
USB: usb-storage: Fix use of bitfields for hardware data in ene_ub6250.c
coresight: Fix TRCCONFIGR.QE sysfs interface
iio: inkern: apply consumer scale on IIO_VAL_INT cases
iio: inkern: apply consumer scale when no channel scale is available
iio: inkern: make a best effort on offset calculation
clk: uniphier: Fix fixed-rate initialization
ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
Documentation: add link to stable release candidate tree
Documentation: update stable tree link
SUNRPC: avoid race between mod_timer() and del_timer_sync()
NFSD: prevent underflow in nfssvc_decode_writeargs()
pinctrl: samsung: drop pin banks references on error paths
can: ems_usb: ems_usb_start_xmit(): fix double dev_kfree_skb() in error path
jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
jffs2: fix memory leak in jffs2_do_mount_fs
jffs2: fix memory leak in jffs2_scan_medium
mm/pages_alloc.c: don't create ZONE_MOVABLE beyond the end of a node
mempolicy: mbind_range() set_policy() after vma_merge()
scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
qed: display VF trust config
qed: validate and restrict untrusted VFs vlan promisc mode
Revert "Input: clear BTN_RIGHT/MIDDLE on buttonpads"
ALSA: cs4236: fix an incorrect NULL check on list iterator
drbd: fix potential silent data corruption
ACPI: properties: Consistently return -ENOENT if there are no more references
drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
video: fbdev: sm712fb: Fix crash in smtcfb_read()
video: fbdev: atari: Atari 2 bpp (STe) palette bugfix
ARM: dts: at91: sama5d2: Fix PMERRLOC resource size
ARM: dts: exynos: fix UART3 pins configuration in Exynos5250
ARM: dts: exynos: add missing HDMI supplies on SMDK5250
ARM: dts: exynos: add missing HDMI supplies on SMDK5420
carl9170: fix missing bit-wise or operator for tx_params
thermal: int340x: Increase bitmap size
lib/raid6/test: fix multiple definition linking error
DEC: Limit PMAX memory probing to R3k systems
media: davinci: vpif: fix unbalanced runtime PM get
brcmfmac: firmware: Allocate space for default boardrev in nvram
brcmfmac: pcie: Replace brcmf_pcie_copy_mem_todev with memcpy_toio
PCI: pciehp: Clear cmd_busy bit in polling mode
crypto: authenc - Fix sleep in atomic context in decrypt_tail
crypto: mxs-dcp - Fix scatterlist processing
spi: tegra114: Add missing IRQ check in tegra_spi_probe
selftests/x86: Add validity check and allow field splitting
spi: pxa2xx-pci: Balance reference count for PCI DMA device
hwmon: (pmbus) Add mutex to regulator ops
hwmon: (sch56xx-common) Replace WDOG_ACTIVE with WDOG_HW_RUNNING
PM: hibernate: fix __setup handler error handling
PM: suspend: fix return value of __setup handler
hwrng: atmel - disable trng on failure path
crypto: vmx - add missing dependencies
ACPI: APEI: fix return value of __setup handlers
crypto: ccp - ccp_dmaengine_unregister release dma channels
hwmon: (pmbus) Add Vin unit off handling
clocksource: acpi_pm: fix return value of __setup handler
sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa
perf/core: Fix address filter parser for multiple filters
perf/x86/intel/pt: Fix address filter config for 32-bit kernel
media: coda: Fix missing put_device() call in coda_get_vdoa_data
video: fbdev: smscufx: Fix null-ptr-deref in ufx_usb_probe()
video: fbdev: fbcvt.c: fix printing in fb_cvt_print_name()
ARM: dts: qcom: ipq4019: fix sleep clock
soc: ti: wkup_m3_ipc: Fix IRQ check in wkup_m3_ipc_probe
media: usb: go7007: s2250-board: fix leak in probe()
ASoC: ti: davinci-i2s: Add check for clk_enable()
ALSA: spi: Add check for clk_enable()
arm64: dts: ns2: Fix spi-cpol and spi-cpha property
arm64: dts: broadcom: Fix sata nodename
printk: fix return value of printk.devkmsg __setup handler
ASoC: mxs-saif: Handle errors for clk_enable
ASoC: atmel_ssc_dai: Handle errors for clk_enable
memory: emif: Add check for setup_interrupts
memory: emif: check the pointer temp in get_device_details()
ALSA: firewire-lib: fix uninitialized flag for AV/C deferred transaction
media: stk1160: If start stream fails, return buffers with VB2_BUF_STATE_QUEUED
ASoC: atmel: Add missing of_node_put() in at91sam9g20ek_audio_probe
ASoC: wm8350: Handle error for wm8350_register_irq
ASoC: fsi: Add check for clk_enable
video: fbdev: omapfb: Add missing of_node_put() in dvic_probe_of
ASoC: dmaengine: do not use a NULL prepare_slave_config() callback
ASoC: mxs: Fix error handling in mxs_sgtl5000_probe
ASoC: imx-es8328: Fix error return code in imx_es8328_probe()
ASoC: msm8916-wcd-digital: Fix missing clk_disable_unprepare() in msm8916_wcd_digital_probe
mtd: onenand: Check for error irq
drm/edid: Don't clear formats if using deep color
ath9k_htc: fix uninit value bugs
power: reset: gemini-poweroff: Fix IRQ check in gemini_poweroff_probe
ray_cs: Check ioremap return value
power: supply: ab8500: Fix memory leak in ab8500_fg_sysfs_init
HID: i2c-hid: fix GET/SET_REPORT for unnumbered reports
iwlwifi: Fix -EIO error code that is never returned
dm crypt: fix get_key_size compiler warning if !CONFIG_KEYS
scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
scsi: pm8001: Fix abort all task initialization
TOMOYO: fix __setup handlers return values
ext2: correct max file size computing
drm/tegra: Fix reference leak in tegra_dsi_ganged_probe
power: supply: bq24190_charger: Fix bq24190_vbus_is_enabled() wrong false return
KVM: x86: Fix emulation in writing cr8
KVM: x86/emulator: Defer not-present segment check in __load_segment_descriptor()
i2c: xiic: Make bus names unique
power: supply: wm8350-power: Handle error for wm8350_register_irq
power: supply: wm8350-power: Add missing free in free_charger_irq
PCI: Reduce warnings on possible RW1C corruption
powerpc/sysdev: fix incorrect use to determine if list is empty
mfd: mc13xxx: Add check for mc13xxx_irq_request
vxcan: enable local echo for sent CAN frames
MIPS: RB532: fix return value of __setup handler
mtd: rawnand: atmel: fix refcount issue in atmel_nand_controller_init
USB: storage: ums-realtek: fix error code in rts51x_read_mem()
af_netlink: Fix shift out of bounds in group mask calculation
i2c: mux: demux-pinctrl: do not deactivate a master that is not active
tcp: ensure PMTU updates are processed during fastopen
mfd: asic3: Add missing iounmap() on error asic3_mfd_probe
mxser: fix xmit_buf leak in activate when LSR == 0xff
pwm: lpc18xx-sct: Initialize driver data and hardware before pwmchip_add()
staging:iio:adc:ad7280a: Fix handing of device address bit reversing.
serial: 8250_mid: Balance reference count for PCI DMA device
serial: 8250: Fix race condition in RTS-after-send handling
iio: adc: Add check for devm_request_threaded_irq
clk: qcom: clk-rcg2: Update the frac table for pixel clock
remoteproc: qcom_wcnss: Add missing of_node_put() in wcnss_alloc_memory_region
clk: loongson1: Terminate clk_div_table with sentinel element
clk: clps711x: Terminate clk_div_table with sentinel element
clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
NFS: remove unneeded check in decode_devicenotify_args()
pinctrl: mediatek: Fix missing of_node_put() in mtk_pctrl_init
pinctrl: nomadik: Add missing of_node_put() in nmk_pinctrl_probe
pinctrl/rockchip: Add missing of_node_put() in rockchip_pinctrl_probe
tty: hvc: fix return value of __setup handler
kgdboc: fix return value of __setup handler
kgdbts: fix return value of __setup handler
jfs: fix divide error in dbNextAG
netfilter: nf_conntrack_tcp: preserve liberal flag in tcp options
xen: fix is_xen_pmu()
net: phy: broadcom: Fix brcm_fet_config_init()
qlcnic: dcb: default to returning -EOPNOTSUPP
net/x25: Fix null-ptr-deref caused by x25_disconnect
NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
lib/test: use after free in register_test_dev_kmod()
selinux: use correct type for context length
loop: use sysfs_emit() in the sysfs xxx show()
Fix incorrect type in assignment of ipv6 port for audit
irqchip/nvic: Release nvic_base upon failure
ACPICA: Avoid walking the ACPI Namespace if it is not there
ACPI/APEI: Limit printable size of BERT table data
PM: core: keep irq flags in device_pm_check_callbacks()
spi: tegra20: Use of_device_get_match_data()
ext4: don't BUG if someone dirty pages without asking ext4 first
ntfs: add sanity check on allocation size
video: fbdev: nvidiafb: Use strscpy() to prevent buffer overflow
video: fbdev: w100fb: Reset global state
video: fbdev: cirrusfb: check pixclock to avoid divide by zero
video: fbdev: omapfb: acx565akm: replace snprintf with sysfs_emit
ARM: dts: qcom: fix gic_irq_domain_translate warnings for msm8960
ARM: dts: bcm2837: Add the missing L1/L2 cache information
video: fbdev: omapfb: panel-dsi-cm: Use sysfs_emit() instead of snprintf()
video: fbdev: omapfb: panel-tpo-td043mtea1: Use sysfs_emit() instead of snprintf()
ASoC: soc-core: skip zero num_dai component in searching dai name
media: cx88-mpeg: clear interrupt status register before streaming video
ARM: tegra: tamonten: Fix I2C3 pad setting
ARM: mmp: Fix failure to remove sram device
video: fbdev: sm712fb: Fix crash in smtcfb_write()
media: hdpvr: initialize dev->worker at hdpvr_register_videodev
mmc: host: Return an error when ->enable_sdio_irq() ops is missing
powerpc/lib/sstep: Fix 'sthcx' instruction
powerpc/lib/sstep: Fix build errors with newer binutils
scsi: qla2xxx: Fix warning for missing error code
scsi: qla2xxx: Suppress a kernel complaint in qla_create_qpair()
KVM: Prevent module exit until all VMs are freed
ubifs: rename_whiteout: Fix double free for whiteout_ui->data
ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
ubifs: rename_whiteout: correct old_dir size computing
can: mcba_usb: mcba_usb_start_xmit(): fix double dev_kfree_skb in error path
can: mcba_usb: properly check endpoint type
gfs2: Make sure FITRIM minlen is rounded up to fs block size
pinctrl: pinconf-generic: Print arguments for bias-pull-*
ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
ACPI: CPPC: Avoid out of bounds access when parsing _CPC data
mm/mmap: return 1 from stack_guard_gap __setup() handler
mm/memcontrol: return 1 from cgroup.memory __setup() handler
ubi: fastmap: Return error code if memory allocation fails in add_aeb()
ASoC: topology: Allow TLV control to be either read or write
ARM: dts: spear1340: Update serial node properties
ARM: dts: spear13xx: Update SPI dma properties
openvswitch: Fixed nd target mask field in the flow dump.
KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated
ubifs: Rectify space amount budget for mkdir/tmpfile operations
rtc: wm8350: Handle error for wm8350_register_irq
ARM: 9187/1: JIVE: fix return value of __setup handler
KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs
ath5k: fix OOB in ath5k_eeprom_read_pcal_info_5111
ptp: replace snprintf with sysfs_emit
powerpc: dts: t104xrdb: fix phy type for FMAN 4/5
scsi: mvsas: Replace snprintf() with sysfs_emit()
scsi: bfa: Replace snprintf() with sysfs_emit()
power: supply: axp20x_battery: properly report current when discharging
powerpc: Set crashkernel offset to mid of RMA region
PCI: aardvark: Fix support for MSI interrupts
iommu/arm-smmu-v3: fix event handling soft lockup
dm ioctl: prevent potential spectre v1 gadget
scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
scsi: aha152x: Fix aha152x_setup() __setup handler return value
net/smc: correct settings of RMB window update limit
macvtap: advertise link netns via netlink
bnxt_en: Eliminate unintended link toggle during FW reset
MIPS: fix fortify panic when copying asm exception handlers
scsi: libfc: Fix use after free in fc_exch_abts_resp()
usb: dwc3: omap: fix "unbalanced disables for smps10_out1" on omap5evm
xtensa: fix DTC warning unit_address_format
Bluetooth: Fix use after free in hci_send_acl
init/main.c: return 1 from handled __setup() functions
w1: w1_therm: fixes w1_seq for ds28ea00 sensors
SUNRPC/call_alloc: async tasks mustn't block waiting for memory
NFS: swap IO handling is slightly different for O_DIRECT IO
NFS: swap-out must always use STABLE writes.
serial: samsung_tty: do not unlock port->lock for uart_write_wakeup()
virtio_console: eliminate anonymous module_init & module_exit
jfs: prevent NULL deref in diFree
parisc: Fix CPU affinity for Lasi, WAX and Dino chips
ipv6: add missing tx timestamping on IPPROTO_RAW
net: add missing SOF_TIMESTAMPING_OPT_ID support
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()
scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one()
net: stmmac: Fix unset max_speed difference between DT and non-DT platforms
drm/imx: Fix memory leak in imx_pd_connector_get_modes
drbd: Fix five use after free bugs in get_initial_state
Revert "mmc: sdhci-xenon: fix annoying 1.8V regulator warning"
mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0)
mm/mempolicy: fix mpol_new leak in shared_policy_replace
x86/pm: Save the MSR validity status at context setup
x86/speculation: Restore speculation related MSRs during S3 resume
btrfs: fix qgroup reserve overflow the qgroup limit
arm64: patch_text: Fixup last cpu should be master
perf: qcom_l2_pmu: fix an incorrect NULL check on list iterator
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
dmaengine: Revert "dmaengine: shdma: Fix runtime PM imbalance on error"
mm: don't skip swap entry even if zap_details specified
arm64: module: remove (NOLOAD) from linker script
mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning
cgroup: Use open-time credentials for process migraton perm checks
cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv
cgroup: Use open-time cgroup namespace for process migration perm checks
xfrm: policy: match with both mark and mask on user interfaces
memory: atmel-ebi: Fix missing of_node_put in atmel_ebi_probe
veth: Ensure eth header is in skb's linear part
gpiolib: acpi: use correct format characters
mlxsw: i2c: Fix initialization error flow
net: ethernet: stmmac: fix altr_tse_pcs function when using a fixed-link
nfc: nci: add flush_workqueue to prevent uaf
cifs: potential buffer overflow in handling symlinks
drm/amd: Add USBC connector ID
drm/amdkfd: Check for potential null return of kmalloc_array()
Drivers: hv: vmbus: Prevent load re-ordering when reading ring buffer
scsi: target: tcmu: Fix possible page UAF
scsi: ibmvscsis: Increase INITIAL_SRP_LIMIT to 1024
net: micrel: fix KS8851_MLL Kconfig
ata: libata-core: Disable READ LOG DMA EXT for Samsung 840 EVOs
gpu: ipu-v3: Fix dev_dbg frequency output
scsi: mvsas: Add PCI ID of RocketRaid 2640
drivers: net: slip: fix NPD bug in sl_tx_timeout()
mm, page_alloc: fix build_zonerefs_node()
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
gcc-plugins: latent_entropy: use /dev/urandom
ALSA: pcm: Test for "silence" field in struct "pcm_format_data"
ARM: davinci: da850-evm: Avoid NULL pointer dereference
smp: Fix offline cpu check in flush_smp_call_function_queue()
i2c: pasemi: Wait for write xfers to finish
Linux 4.14.276
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I45d8292ce654c0236758030a89b4618cf3a3d87b
Sync the UFS driver to the latest version
1. Support Turbo Write & HPB driver.
2. Fix clock gating worker dead lock issue.
3. Fix hang when send unit ready in suspend.
4. Use default timeout 30 seconds in scsi layer.
MTK-Commit-Id: bba00ec06261e774b5956245d3e36dc74cc65132
Change-Id: I30eaeffdd58b710aef028a2841ed90857ba47649
Signed-off-by: Andy Teng <andy.teng@mediatek.com>
CR-Id: ALPS04948683
Feature: UFS Booting
1.Add cqhci support for MT6768
2.Add set partition as read only according to wp status
MTK-Commit-Id: bae24bb074be5fc853a70136d380b664be8f5874
Change-Id: I55758439eb9a8b3758bd6d6a1451051eb45c43f5
Signed-off-by: Huan Tang <huan.tang@mediatek.com>
CR-Id: ALPS04308321
Feature: eMMC Boot Up
Now HIE is enabled for a request if "any" storage device driver
registered its HIE device regardless of checking if requests
backing block device has inline encryption capability specficially.
Therefore, we introduce "inline crypt" attribute on request queue
and connect it with HIE and storage device driver, like UFS and MMC.
HIE will be enabled for those requests which block request queue
specifically supports inline encryption.
In addition, a new sysfs node "inline_crypt" on request queue is
provided for examination.
MTK-Commit-Id: 63b635cd9bf533c16534b6ac74e22e13d7551678
Change-Id: I73f1ece1afffbc75d8769d873080c14aa65cc9bb
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
CR-Id: ALPS04249345
Feature: Multi-Storage
Changes in 4.14.197
HID: core: Correctly handle ReportSize being zero
HID: core: Sanitize event code and type when mapping input
perf record/stat: Explicitly call out event modifiers in the documentation
drm/msm: add shutdown support for display platform_driver
hwmon: (applesmc) check status earlier.
nvmet: Disable keep-alive timer when kato is cleared to 0h
ceph: don't allow setlease on cephfs
cpuidle: Fixup IRQ state
s390: don't trace preemption in percpu macros
xen/xenbus: Fix granting of vmalloc'd memory
dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
batman-adv: Avoid uninitialized chaddr when handling DHCP
batman-adv: Fix own OGM check in aggregated OGMs
batman-adv: bla: use netif_rx_ni when not in interrupt context
dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
MIPS: mm: BMIPS5000 has inclusive physical caches
MIPS: BMIPS: Also call bmips_cpu_setup() for secondary cores
netfilter: nf_tables: add NFTA_SET_USERDATA if not null
netfilter: nf_tables: incorrect enum nft_list_attributes definition
netfilter: nf_tables: fix destination register zeroing
net: hns: Fix memleak in hns_nic_dev_probe
net: systemport: Fix memleak in bcm_sysport_probe
ravb: Fixed to be able to unload modules
net: arc_emac: Fix memleak in arc_mdio_probe
dmaengine: pl330: Fix burst length if burst size is smaller than bus width
gtp: add GTPA_LINK info to msg sent to userspace
bnxt_en: Check for zero dir entries in NVRAM.
bnxt_en: Fix PCI AER error recovery flow
nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
perf tools: Correct SNOOPX field offset
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
fix regression in "epoll: Keep a reference on files added to the check list"
tg3: Fix soft lockup when tg3_reset_task() fails.
iommu/vt-d: Serialize IOMMU GCMD register modifications
thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
btrfs: drop path before adding new uuid tree entry
btrfs: Remove redundant extent_buffer_get in get_old_root
btrfs: Remove extraneous extent_buffer_get from tree_mod_log_rewind
btrfs: set the lockdep class for log tree extent buffers
uaccess: Add non-pagefault user-space read functions
uaccess: Add non-pagefault user-space write function
btrfs: fix potential deadlock in the search ioctl
net: usb: qmi_wwan: add Telit 0x1050 composition
usb: qmi_wwan: add D-Link DWM-222 A2 device ID
ALSA: ca0106: fix error code handling
ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check
ALSA: hda/hdmi: always check pin power status in i915 pin fixup
ALSA: firewire-digi00x: exclude Avid Adrenaline from detection
affs: fix basic permission bits to actually work
block: allow for_each_bvec to support zero len bvec
block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>
libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
dm cache metadata: Avoid returning cmd->bm wild pointer on error
dm thin metadata: Avoid returning cmd->bm wild pointer on error
mm: slub: fix conversion of freelist_corrupted()
KVM: arm64: Add kvm_extable for vaxorcism code
KVM: arm64: Defer guest entry when an asynchronous exception is pending
KVM: arm64: Survive synchronous exceptions caused by AT instructions
KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
checkpatch: fix the usage of capture group ( ... )
mm/hugetlb: fix a race between hugetlb sysctl handlers
cfg80211: regulatory: reject invalid hints
net: usb: Fix uninit-was-stored issue in asix_read_phy_addr()
Linux 4.14.197
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I59da148f114d7f8fd84fc76c8178081ddaf26a49
commit 233bde21aa43516baa013ef7ac33f3427056db3e upstream.
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.
Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.
Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Changes in 4.14.181
USB: serial: qcserial: Add DW5816e support
dp83640: reverse arguments to list_add_tail
fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks
net: macsec: preserve ingress frame ordering
net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
net: usb: qmi_wwan: add support for DW5816e
sch_choke: avoid potential panic in choke_reset()
sch_sfq: validate silly quantum values
bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features().
net/mlx5: Fix forced completion access non initialized command entry
net/mlx5: Fix command entry leak in Internal Error State
bnxt_en: Improve AER slot reset.
bnxt_en: Fix VF anti-spoof filter setup.
net: stricter validation of untrusted gso packets
ipv6: fix cleanup ordering for ip6_mr failure
HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
USB: uas: add quirk for LaCie 2Big Quadra
USB: serial: garmin_gps: add sanity checking for data length
tracing: Add a vmalloc_sync_mappings() for safe measure
KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
coredump: fix crash when umh is disabled
batman-adv: fix batadv_nc_random_weight_tq
batman-adv: Fix refcnt leak in batadv_show_throughput_override
batman-adv: Fix refcnt leak in batadv_store_throughput_override
batman-adv: Fix refcnt leak in batadv_v_ogm_process
x86/entry/64: Fix unwind hints in kernel exit path
x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
x86/unwind/orc: Don't skip the first frame for inactive tasks
x86/unwind/orc: Prevent unwinding before ORC initialization
x86/unwind/orc: Fix error path for bad ORC entry type
netfilter: nat: never update the UDP checksum when it's 0
objtool: Fix stack offset tracking for indirect CFAs
scripts/decodecode: fix trapping instruction formatting
net: ipv6: add net argument to ip6_dst_lookup_flow
net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup
blktrace: fix unlocked access to init/start-stop/teardown
blktrace: fix trace mutex deadlock
blktrace: Protect q->blk_trace with RCU
blktrace: fix dereference after null check
f2fs: introduce read_inline_xattr
f2fs: introduce read_xattr_block
f2fs: sanity check of xattr entry size
f2fs: fix to avoid accessing xattr across the boundary
f2fs: fix to avoid memory leakage in f2fs_listxattr
net: stmmac: Use mutex instead of spinlock
shmem: fix possible deadlocks on shmlock_user_lock
net/sonic: Fix a resource leak in an error handling path in 'jazz_sonic_probe()'
net: moxa: Fix a potential double 'free_irq()'
drop_monitor: work around gcc-10 stringop-overflow warning
virtio-blk: handle block_device_operations callbacks after hot unplug
scsi: sg: add sg_remove_request in sg_write
dmaengine: pch_dma.c: Avoid data race between probe and irq handler
dmaengine: mmp_tdma: Reset channel error on release
cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
ALSA: hda/hdmi: fix race in monitor detection during probe
drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
ipc/util.c: sysvipc_find_ipc() incorrectly updates position index
ALSA: hda/realtek - Fix S3 pop noise on Dell Wyse
x86/entry/64: Fix unwind hints in register clearing code
ipmi: Fix NULL pointer dereference in ssif_probe
pinctrl: baytrail: Enable pin configuration setting for GPIO chip
pinctrl: cherryview: Add missing spinlock usage in chv_gpio_irq_handler
i40iw: Fix error handling in i40iw_manage_arp_cache()
netfilter: conntrack: avoid gcc-10 zero-length-bounds warning
IB/mlx4: Test return value of calls to ib_get_cached_pkey
hwmon: (da9052) Synchronize access with mfd
pnp: Use list_for_each_entry() instead of open coding
gcc-10 warnings: fix low-hanging fruit
kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
Stop the ad-hoc games with -Wno-maybe-initialized
gcc-10: disable 'zero-length-bounds' warning for now
gcc-10: disable 'array-bounds' warning for now
gcc-10: disable 'stringop-overflow' warning for now
gcc-10: disable 'restrict' warning for now
gcc-10: avoid shadowing standard library 'free()' in crypto
x86/asm: Add instruction suffixes to bitops
net: phy: micrel: Use strlcpy() for ethtool::get_strings
net: fix a potential recursive NETDEV_FEAT_CHANGE
netlabel: cope with NULL catmap
net: phy: fix aneg restart in phy_ethtool_set_eee
Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"
hinic: fix a bug of ndo_stop
net: dsa: loop: Add module soft dependency
net: ipv4: really enforce backoff for redirects
netprio_cgroup: Fix unlimited memory leak of v2 cgroups
net: tcp: fix rx timestamp behavior for tcp_recvmsg
ALSA: hda/realtek - Limit int mic boost for Thinkpad T530
ALSA: rawmidi: Initialize allocated buffers
ALSA: rawmidi: Fix racy buffer resize under concurrent accesses
ARM: dts: dra7: Fix bus_dma_limit for PCIe
ARM: dts: imx27-phytec-phycard-s-rdk: Fix the I2C1 pinctrl entries
x86: Fix early boot crash on gcc-10, third try
ALSA: usb-audio: Add control message quirk delay for Kingston HyperX headset
usb: core: hub: limit HUB_QUIRK_DISABLE_AUTOSUSPEND to USB5534B
usb: host: xhci-plat: keep runtime active when removing host
USB: gadget: fix illegal array access in binding with UDC
usb: xhci: Fix NULL pointer dereference when enqueuing trbs from urb sg list
x86/unwind/orc: Fix error handling in __unwind_start()
exec: Move would_dump into flush_old_exec
clk: rockchip: fix incorrect configuration of rk3228 aclk_gpu* clocks
usb: gadget: net2272: Fix a memory leak in an error handling path in 'net2272_plat_probe()'
usb: gadget: audio: Fix a missing error return value in audio_bind()
usb: gadget: legacy: fix error return code in gncm_bind()
usb: gadget: legacy: fix error return code in cdc_bind()
Revert "ALSA: hda/realtek: Fix pop noise on ALC225"
arm64: dts: rockchip: Replace RK805 PMIC node name with "pmic" on rk3328 boards
arm64: dts: rockchip: Rename dwc3 device nodes on rk3399 to make dtc happy
ARM: dts: r8a73a4: Add missing CMT1 interrupts
ARM: dts: r8a7740: Add missing extal2 to CPG node
KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
Makefile: disallow data races on gcc-10 as well
Linux 4.14.181
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie1fb614d727dc6aad472bea0234073076eae8c8b
commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream.
KASAN is reporting that __blk_add_trace() has a use-after-free issue
when accessing q->blk_trace. Indeed the switching of block tracing (and
thus eventual freeing of q->blk_trace) is completely unsynchronized with
the currently running tracing and thus it can happen that the blk_trace
structure is being freed just while __blk_add_trace() works on it.
Protect accesses to q->blk_trace by RCU during tracing and make sure we
wait for the end of RCU grace period when shutting down tracing. Luckily
that is rare enough event that we can afford that. Note that postponing
the freeing of blk_trace to an RCU callback should better be avoided as
it could have unexpected user visible side-effects as debugfs files
would be still existing for a short while block tracing has been shut
down.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
CC: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Tristan Madani <tristmd@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
[bwh: Backported to 4.14: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size, etc.)
along with a data transfer request to a storage device, and the inline
encryption hardware will use that context to en/decrypt the data. The
inline encryption hardware is part of the storage device, and it
conceptually sits on the data path between system memory and the storage
device.
Inline Encryption hardware implementations often function around the
concept of "keyslots". These implementations often have a limited number
of "keyslots", each of which can hold an encryption context (we say that
an encryption context can be "programmed" into a keyslot). Requests made
to the storage device may have a keyslot associated with them, and the
inline encryption hardware will en/decrypt the data in the requests using
the encryption context programmed into that associated keyslot. As
keyslots are limited, and programming keys may be expensive in many
implementations, and multiple requests may use exactly the same encryption
contexts, we introduce a Keyslot Manager to efficiently manage keyslots.
The keyslot manager also functions as the interface that upper layers will
use to program keys into inline encryption hardware. For more information
on the Keyslot Manager, refer to documentation found in
block/keyslot-manager.c and linux/keyslot-manager.h.
Bug: 137270441
Test: tested as series; see Ie1b77f7615d6a7a60fdc9105c7ab2200d17636a8
Change-Id: Iea1ee5a7eec46cb50d33cf1e2d20dfb7335af4ed
Signed-off-by: Satya Tangirala <satyat@google.com>
Link: https://lore.kernel.org/linux-fscrypt/20191028072032.6911-2-satyat@google.com/
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.
For exmaple (run this on an architecture with 64k pages):
Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock
This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed upstream.
Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries
to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
but ignores that passthrough I/O can use blk_queue_bounce() too.
Especially, passthrough IO may not be sector-aligned, and the check
of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
then cause the bio splitted, and the original passthrough bio is submited
to generic_make_request().
This patch fixes this issue by checking if the bio is passthrough IO,
and use bio_kmalloc() to allocate the cloned passthrough bio.
Cc: NeilBrown <neilb@suse.com>
Fixes: a8821f3f3("block: Improvements to bounce-buffer handling")
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0abc2a10389f0c9070f76ca906c7382788036b93 upstream.
Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.
Fixes: caa4b02476 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Michele Ballabio <barra_cuda@katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4ccafe032005e9b96acbef2e389a4de5b1254add upstream.
A previous change blindly added massive alignment to the
call_single_data structure in struct request. This ballooned it in size
from 296 to 320 bytes on my setup, for no valid reason at all.
Use the unaligned struct __call_single_data variant instead.
Fixes: 966a967116 ("smp: Avoid using two cache lines for struct call_single_data")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lockdep code had reported the following unsafe locking scenario:
CPU0 CPU1
---- ----
lock(s_active#228);
lock(&bdev->bd_mutex/1);
lock(s_active#228);
lock(&bdev->bd_mutex);
*** DEADLOCK ***
The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.
The s_active isn't an actual lock. It is a reference count (kn->count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.
The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.
Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix typo in patch subject line, and prune a comment detailing how
the code used to work.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
struct call_single_data is used in IPIs to transfer information between
CPUs. Its size is bigger than sizeof(unsigned long) and less than
cache line size. Currently it is not allocated with any explicit alignment
requirements. This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.
This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2. If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.
Fortunately, this is enforced by GCC, which will report bad sizes.
To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.
To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt). The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory. In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.
Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.
An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:0000000001590007 R3:0000000000000024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: <Long List>
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 0000000065cb0100 task.stack: 0000000065cb4000
Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866
000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f
00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38
0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8
Krnl Code: 000003ff801e4146: e31020500004 lg %r1,80(%r2)
000003ff801e414c: 58402040 l %r4,64(%r2)
#000003ff801e4150: e35020200004 lg %r5,32(%r2)
>000003ff801e4156: 50405004 st %r4,4(%r5)
000003ff801e415a: e54c50080000 mvhi 8(%r5),0
000003ff801e4160: e33010280012 lt %r3,40(%r1)
000003ff801e4166: a718fffb lhi %r1,-5
000003ff801e416a: 1803 lr %r0,%r3
Call Trace:
([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
[<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
[<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
[<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8
[<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
[<00000000001684c2>] tasklet_action+0x15a/0x1d8
[<0000000000bd28ec>] __do_softirq+0x3ec/0x848
[<00000000001675a4>] irq_exit+0x74/0xf8
[<000000000010dd6a>] do_IRQ+0xba/0xf0
[<0000000000bd19e8>] io_int_handler+0x104/0x2d4
[<00000000001033b6>] enabled_wait+0xb6/0x188
([<000000000010339e>] enabled_wait+0x9e/0x188)
[<000000000010396a>] arch_cpu_idle+0x32/0x50
[<0000000000bd0112>] default_idle_call+0x52/0x68
[<00000000001cd0fa>] do_idle+0x102/0x188
[<00000000001cd41e>] cpu_startup_entry+0x3e/0x48
[<0000000000118c64>] smp_start_secondary+0x11c/0x130
[<0000000000bd2016>] restart_int_handler+0x62/0x78
[<0000000000000000>] (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.
This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Fixes: 82ed4db499 ("block: split scsi_request out of struct request")
Cc: <stable@vger.kernel.org> #4.11+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only used inside the bounce code, and opencoding it makes it more obvious
what is going on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Useful to verify that things are working the way they should.
Reading the file will return number of kb written with each
write hint. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->write_hints[] if they handle a given
write hint.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
No functional changes in this patch, we just use up some holes
in the bio and request structures to define a write hint that
we psas down the stack.
Ensure that we don't merge requests that have different life time
hints assigned to them, and that we inherit the write hint when
cloning a bio.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull in the fix for shared tags, as it conflicts with the pending
changes in for-4.13/block. We already pulled in v4.12-rc5 to solve
other conflicts or get fixes that went into 4.12, so not a lot
of changes in this merge.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If we have shared tags enabled, then every IO completion will trigger
a full loop of every queue belonging to a tag set, and every hardware
queue for each of those queues, even if nothing needs to be done.
This causes a massive performance regression if you have a lot of
shared devices.
Instead of doing this huge full scan on every IO, add an atomic
counter to the main queue that tracks how many hardware queues have
been marked as needing a restart. With that, we can avoid looking for
restartable queues, if we don't have to.
Max reports that this restores performance. Before this patch, 4K
IOPS was limited to 22-23K IOPS. With the patch, we are running at
950-970K IOPS.
Fixes: 6d8c6c0f97 ("blk-mq: Restart a single queue if tag sets are shared")
Reported-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several block drivers need to initialize the driver-private request
data after having called blk_get_request() and before .prep_rq_fn()
is called, e.g. when submitting a REQ_OP_SCSI_* request. Avoid that
that initialization code has to be repeated after every
blk_get_request() call by adding new callback functions to struct
request_queue and to struct blk_mq_ops.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of declaring the second argument of blk_*_get_request()
as int and passing it to functions that expect an unsigned int,
declare that second argument as unsigned int. Also because of
consistency, rename that second argument from 'rw' into 'op'.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While the Write Same page currently always is in low-level it is just
as easy and safer to just compare the page and offset directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It is required that no dispatch can happen any more once
blk_mq_quiesce_queue() returns, and we don't have such requirement
on APIs of stopping queue.
But blk_mq_quiesce_queue() still may not block/drain dispatch in the
the case of BLK_MQ_S_START_ON_RUN, so use the new introduced flag of
QUEUE_FLAG_QUIESCED and evaluate it inside RCU read-side critical
sections for fixing this issue.
Also blk_mq_quiesce_queue() is implemented via stopping queue, which
limits its uses, and easy to cause race, because any queue restart in
other paths may break blk_mq_quiesce_queue(). With the introduced
flag of QUEUE_FLAG_QUIESCED, we don't need to depend on stopping queue
for quiescing any more.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.
Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.
This is inconsistent and unnecessary. Remove the last arg and always use
q->bio_split inside blk_queue_split()
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Avoid that the following complaint is reported:
BUG: sleeping function called from invalid context at kernel/workqueue.c:2790
in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3
1 lock held by rcuop/3/41:
#0: (rcu_callback){......}, at: [<ffffffff8111f9a2>] rcu_nocb_kthread+0x282/0x500
Call Trace:
dump_stack+0x86/0xcf
___might_sleep+0x174/0x260
__might_sleep+0x4a/0x80
flush_work+0x7e/0x2e0
__cancel_work_timer+0x143/0x1c0
cancel_work_sync+0x10/0x20
blk_throtl_exit+0x25/0x60
blkcg_exit_queue+0x35/0x40
blk_release_queue+0x42/0x130
kobject_put+0xa9/0x190
This happens since we invoke callbacks that need to block from the
queue release handler. Fix this by pushing the final release to
a workqueue.
Reported-by: Ross Zwisler <zwisler@gmail.com>
Fixes: commit b425e50492 ("block: Avoid that blk_exit_rl() triggers a use-after-free")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Updated changelog
Signed-off-by: Jens Axboe <axboe@fb.com>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
From the context where a SCSI command is submitted it is not always
possible to figure out whether or not the queue the command is
submitted to has struct scsi_request as the first member of its
private data. Hence introduce the flag QUEUE_FLAG_SCSI_PASSTHROUGH.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
For configurations that do not enable DAX filesystems or drivers, do not
require the DAX core to be built.
Given that the 'direct_access' method has been removed from
'block_device_operations', we can also go ahead and remove the
block-related dax helper functions from fs/block_dev.c to
drivers/dax/super.c. This keeps dax details out of the block layer and
lets the DAX core be built as a module in the FS_DAX=n case.
Filesystems need to include dax.h to call bdev_dax_supported().
Cc: linux-xfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull block fixes and updates from Jens Axboe:
"Some fixes and followup features/changes that should go in, in this
merge window. This contains:
- Two fixes for lightnvm from Javier, fixing problems in the new code
merge previously in this merge window.
- A fix from Jan for the backing device changes, fixing an issue in
NFS that causes a failure to mount on certain setups.
- A change from Christoph, cleaning up the blk-mq init and exit
request paths.
- Remove elevator_change(), which is now unused. From Bart.
- A fix for queue operation invocation on a dead queue, from Bart.
- A series fixing up mtip32xx for blk-mq scheduling, removing a
bandaid we previously had in place for this. From me.
- A regression fix for this series, fixing a case where we wait on
workqueue flushing from an invalid (non-blocking) context. From me.
- A fix/optimization from Ming, ensuring that we don't both quiesce
and freeze a queue at the same time.
- A fix from Peter on lock ordering for CPU hotplug. Not a real
problem right now, but will be once the CPU hotplug rework goes in.
- A series from Omar, cleaning up out blk-mq debugfs support, and
adding support for exporting info from schedulers in debugfs as
well. This is really useful in debugging stalls or livelocks. From
Omar"
* 'for-linus' of git://git.kernel.dk/linux-block: (28 commits)
mq-deadline: add debugfs attributes
kyber: add debugfs attributes
blk-mq-debugfs: allow schedulers to register debugfs attributes
blk-mq: untangle debugfs and sysfs
blk-mq: move debugfs declarations to a separate header file
blk-mq: Do not invoke queue operations on a dead queue
blk-mq-debugfs: get rid of a bunch of boilerplate
blk-mq-debugfs: rename hw queue directories from <n> to hctx<n>
blk-mq-debugfs: don't open code strstrip()
blk-mq-debugfs: error on long write to queue "state" file
blk-mq-debugfs: clean up flag definitions
blk-mq-debugfs: separate flags with |
nfs: Fix bdi handling for cloned superblocks
block/mq: Cure cpu hotplug lock inversion
lightnvm: fix bad back free on error path
lightnvm: create cmd before allocating request
blk-mq: don't use sync workqueue flushing from drivers
mtip32xx: convert internal commands to regular block infrastructure
mtip32xx: cleanup internal tag assumptions
block: don't call blk_mq_quiesce_queue() after queue is frozen
...
Pull libnvdimm updates from Dan Williams:
"The bulk of this has been in multiple -next releases. There were a few
late breaking fixes and small features that got added in the last
couple days, but the whole set has received a build success
notification from the kbuild robot.
Change summary:
- Region media error reporting: A libnvdimm region device is the
parent to one or more namespaces. To date, media errors have been
reported via the "badblocks" attribute attached to pmem block
devices for namespaces in "raw" or "memory" mode. Given that
namespaces can be in "device-dax" or "btt-sector" mode this new
interface reports media errors generically, i.e. independent of
namespace modes or state.
This subsequently allows userspace tooling to craft "ACPI 6.1
Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
requests and submit them via the ioctl path for NVDIMM root bus
devices.
- Introduce 'struct dax_device' and 'struct dax_operations': Prompted
by a request from Linus and feedback from Christoph this allows for
dax capable drivers to publish their own custom dax operations.
This fixes the broken assumption that all dax operations are
related to a persistent memory device, and makes it easier for
other architectures and platforms to add customized persistent
memory support.
- 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger
memory controllers to drain write-pending buffers that would
otherwise be flushed automatically by the platform ADR
(asynchronous-DRAM-refresh) mechanism at a power loss event.
Support for "locked" DIMMs is included to prevent namespaces from
surfacing when the namespace label data area is locked. Finally,
fixes for various reported deadlocks and crashes, also tagged for
-stable.
- ACPI / nfit driver updates: General updates of the nfit driver to
add DSM command overrides, ACPI 6.1 health state flags support, DSM
payload debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
- commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
Tested-by: Yi Zhang <yizhan@redhat.com>
- commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
libnvdimm, pfn: fix 'npfns' vs section alignment
libnvdimm: handle locked label storage areas
libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
brd: fix uninitialized use of brd->dax_dev
block, dax: use correct format string in bdev_dax_supported
device-dax: fix sysfs attribute deadlock
libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
libnvdimm: rework region badblocks clearing
acpi, nfit: kill ACPI_NFIT_DEBUG
libnvdimm: fix clear length of nvdimm_forget_poison()
libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
libnvdimm, region: sysfs trigger for nvdimm_flush()
libnvdimm: fix phys_addr for nvdimm_clear_poison
x86, dax, pmem: remove indirection around memcpy_from_pmem()
block: remove block_device_operations ->direct_access()
block, dax: convert bdev_dax_supported() to dax_direct_access()
filesystem-dax: convert to dax_direct_access()
Revert "block: use DAX for partition table reads"
ext2, ext4, xfs: retrieve dax_device for iomap operations
...
This provides the infrastructure for schedulers to expose their internal
state through debugfs. We add a list of queue attributes and a list of
hctx attributes to struct elevator_type and wire them up when switching
schedulers.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Add missing seq_file.h header in blk-mq-debugfs.h
Signed-off-by: Jens Axboe <axboe@fb.com>
Originally, I tied debugfs registration/unregistration together with
sysfs. There's no reason to do this, and it's getting in the way of
letting schedulers define their own debugfs attributes. Instead, tie the
debugfs registration to the lifetime of the structures themselves.
The saner lifetimes mean we can also get rid of the extra mq directory
and move everything one level up. I.e., nvme0n1/mq/hctx0/tags is now
just nvme0n1/hctx0/tags.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
This modifies (or adds, if not currently pending) an existing
delayed work item.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>