Update the device-mapper core to support exposing the inline crypto
support of the underlying device(s) through the device-mapper device.
This works by creating a "passthrough keyslot manager" for the dm
device, which declares support for the set of (crypto_mode,
data_unit_size) combos which all the underlying devices support. When a
supported combo is used, the bio cloning code handles cloning the crypto
context to the bios for all the underlying devices. When an unsupported
combo is used, the blk-crypto fallback is used as usual.
Crypto support on each underlying device is ignored unless the
corresponding dm target opts into exposing it. This is needed because
for inline crypto to semantically operate on the original bio, the data
must not be transformed by the dm target. Thus, targets like dm-linear
can expose crypto support of the underlying device, but targets like
dm-crypt can't. (dm-crypt could use inline crypto itself, though.)
When a key is evicted from the dm device, it is evicted from all
underlying devices.
Bug: 137270441
Bug: 147814592
Change-Id: If28b574f2e28268db5eb9f325d4cf8f96cb63e3f
Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from commit 44e1174c18)
* refs/heads/tmp-8feec99:
Documentation: devicetree: Remove Exynos bindings from kernel
Documentation: devicetree: Remove Armadeus bindings from kernel
Revert "dt-bindings: mmc: Add supports-cqe property"
Revert "dt-bindings: mmc: Add disable-cqe-dcmd property."
Linux 4.19.73
vhost: make sure log_num < in_num
powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
powerpc/tm: Remove msr_tm_active()
PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
ext4: unsigned int compared against zero
ext4: fix block validity checks for journal inodes using indirect blocks
ext4: don't perform block validity checks on the journal inode
drm/atomic_helper: Allow DPMS On<->Off changes for unregistered connectors
virtio/s390: fix race on airq_areas[]
drm/i915: Make sure cdclk is high enough for DP audio on VLV/CHV
bcache: fix race in btree_flush_write()
bcache: add comments for mutex_lock(&b->write_lock)
bcache: only clear BTREE_NODE_dirty bit when it is set
NFSv4: Fix delegation state recovery
iio: adc: gyroadc: fix uninitialized return code
mm/migrate.c: initialize pud_entry in migrate_vma()
i2c: at91: fix clk_offset for sama5d2
i2c: at91: disable TXRDY interrupt after sending data
gpio: don't WARN() on NULL descs if gpiolib is disabled
iommu/iova: Remove stale cached32_node
powerpc/mm: Limit rma_size to 1TB when running without HV mode
ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
drm/panel: Add support for Armadeus ST0700 Adapt
dm thin metadata: check if in fail_io mode when setting needs_check
pstore: Fix double-free in pstore_mkfile() failure path
resource: fix locking in find_next_iomem_res()
resource: Fix find_next_iomem_res() iteration issue
resource: Include resource end in walk_*() interfaces
btrfs: correctly validate compression type
RDMA/srp: Accept again source addresses that do not have a port number
RDMA/srp: Document srp_parse_in() arguments
ARM: dts: gemini: Set DIR-685 SPI CS as active low
KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation
KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct
KVM: VMX: check CPUID before allowing read/write of IA32_XSS
KVM: VMX: Fix handling of #MC that occurs during VM-Entry
KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
KVM: x86: optimize check for valid PAT value
ceph: use ceph_evict_inode to cleanup inode's resource
ALSA: hda - Don't resume forcibly i915 HDMI/DP codec
cifs: Properly handle auto disabling of serverino option
scsi: zfcp: fix request object use-after-free in send path causing wrong traces
staging: wilc1000: fix error path cleanup in wilc_wlan_initialize()
scsi: target/iblock: Fix overrun in WRITE SAME emulation
scsi: target/core: Use the SECTOR_SHIFT constant
apparmor: reset pos on failure to unpack for various functions
IB/hfi1: Avoid hardlockup with flushlist_lock
clk: tegra210: Fix default rates for HDA clocks
clk: tegra: Fix maximum audio sync clock for Tegra124/210
cifs: add spinlock for the openFileList to cifsInodeInfo
Btrfs: fix race between block group removal and block group allocation
drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)
kvm: Check irqchip mode before assign irqfd
drm/amdkfd: Add missing Polaris10 ID
ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
ARC: mm: fix uninitialised signal code in do_page_fault
signal/arc: Use force_sig_fault where appropriate
dm crypt: move detailed message into debug level
cifs: smbd: take an array of reqeusts when sending upper layer data
PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code
mmc: sdhci-pci: Add support for Intel CML
blk-mq: free hw queue's resource in hctx's release handler
dm mpath: fix missing call of path selector type->end_io
PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
PCI: Add macro for Switchtec quirk declarations
dt-bindings: mmc: Add disable-cqe-dcmd property.
dt-bindings: mmc: Add supports-cqe property
ARM: dts: qcom: ipq4019: enlarge PCIe BAR range
ARM: dts: qcom: ipq4019: Fix MSI IRQ type
ARM: dts: qcom: ipq4019: fix PCI range
ext4: protect journal inode's blocks using block_validity
media: i2c: tda1997x: select V4L2_FWNODE
cifs: Fix lease buffer length error
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
kvm: mmu: Fix overflow on kvm mmu page limit calculation
IB/mlx5: Reset access mask when looping inside page fault handler
arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
drm/i915: Sanity check mmap length against object size
drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set
CIFS: Fix leaking locked VFS cache pages in writeback retry
CIFS: Fix error paths in writeback code
drm: add __user attribute to ptr_to_compat()
PCI: qcom: Don't deassert reset GPIO during probe
PCI: qcom: Fix error handling in runtime PM support
btrfs: init csum_list before possible free
btrfs: scrub: fix circular locking dependency warning
btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
btrfs: scrub: pass fs_info to scrub_setup_ctx
mmc: renesas_sdhi: Fix card initialization failure in high speed mode
powerpc/kvm: Save and restore host AMR/IAMR/UAMOR
spi: spi-gpio: fix SPI_CS_HIGH capability
x86/kvmclock: set offset for kvm unstable clock
iwlwifi: add new card for 9260 series
iwlwifi: fix devices with PCI Device ID 0x34F0 and 11ac RF modules
drm/nouveau: Don't WARN_ON VCPI allocation failures
mt76: fix corrupted software generated tx CCMP PN
iio: adc: exynos-adc: Use proper number of channels for Exynos4x12
dt-bindings: iio: adc: exynos-adc: Add S5PV210 variant
iio: adc: exynos-adc: Add S5PV210 variant
KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run
bcache: treat stale && dirty keys as bad keys
bcache: replace hard coded number with BUCKET_GC_GEN_MAX
tpm: Fix some name collisions with drivers/char/tpm.h
mfd: Kconfig: Fix I2C_DESIGNWARE_PLATFORM dependencies
drm/i915/ilk: Fix warning when reading emon_status with no output
drm/vblank: Allow dynamic per-crtc max_vblank_count
crypto: ccree - add missing inline qualifier
crypto: ccree - fix resume race condition on init
IB/uverbs: Fix OOPs upon device disassociation
ARC: mm: do_page_fault fixes#1: relinquish mmap_sem if signal arrives while handle_mm_fault
ARC: show_regs: lockdep: re-enable preemption
media: vim2m: only cancel work if it is for right context
btrfs: Use real device structure to verify dev extent
btrfs: volumes: Make sure no dev extent is beyond device boundary
powerpc/pkeys: Fix handling of pkey state across fork()
scsi: megaraid_sas: Use 63-bit DMA addressing
scsi: megaraid_sas: Add check for reset adapter bit
scsi: megaraid_sas: Fix combined reply queue mode detection
btrfs: Fix error handling in btrfs_cleanup_ordered_extents
btrfs: Remove extent_io_ops::fill_delalloc
Btrfs: fix deadlock with memory reclaim during scrub
Btrfs: clean up scrub is_dev_replace parameter
KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch
drm/i915: Cleanup gt powerstate from gem
drm/i915: Restore sane defaults for KMS on GEM error load
media: vim2m: use cancel_delayed_work_sync instead of flush_schedule_work
media: vim2m: use workqueue
s390/zcrypt: reinit ap queue state machine during device probe
ARM: davinci: dm644x: define gpio interrupts as separate resources
ARM: davinci: dm355: define gpio interrupts as separate resources
ARM: davinci: dm646x: define gpio interrupts as separate resources
ARM: davinci: dm365: define gpio interrupts as separate resources
ARM: davinci: da8xx: define gpio interrupts as separate resources
drm/amd/dm: Understand why attaching path/tile properties are needed
drm/amd/pp: Fix truncated clock value when set watermark
powerplay: Respect units on max dcfclk watermark
Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
Drivers: hv: kvp: Fix the indentation of some "break" statements
drm/atomic_helper: Disallow new modesets on unregistered connectors
drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
drm/i915: Rename PLANE_CTL_DECOMPRESSION_ENABLE
drm/i915: Fix intel_dp_mst_best_encoder()
x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit
KVM: hyperv: define VP assist page helpers
KVM: x86: hyperv: keep track of mismatched VP indexes
KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables
KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS
drm/amdgpu: Update gc_9_0 golden settings.
drm/amdgpu/gfx9: Update gfx9 golden settings.
remoteproc: qcom: q6v5-mss: add SCM probe dependency
x86, hibernate: Fix nosave_regions setup for hibernation
Drivers: hv: kvp: Fix two "this statement may fall through" warnings
keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
scsi: qla2xxx: Move log messages before issuing command to firmware
media: cec: remove cec-edid.c
media: cec/v4l2: move V4L2 specific CEC functions to V4L2
drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse"
kernel/module: Fix mem leak in module_add_modinfo_attrs
modules: always page-align module section allocations
remoteproc: qcom: q6v5: shore up resource probe handling
clk: s2mps11: Add used attribute to s2mps11_dt_match
nvme-fc: use separate work queue to avoid warning
riscv: remove unused variable in ftrace
scripts/decode_stacktrace: match basepath using shell prefix operator, not regex
arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
media: stm32-dcmi: fix irq = 0 case
powerpc/64: mark start_here_multiplatform as __ref
x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
selftests: fib_rule_tests: use pre-defined DEV_ADDR
timekeeping: Use proper ktime_add when adding nsecs in coarse offset
{nl,mac}80211: fix interface combinations on crypto controlled devices
blk-iolatency: fix STS_AGAIN handling
Blk-iolatency: warn on negative inflight IO counter
hv_sock: Fix hang when a connection is closed
batman-adv: Only read OGM tvlv_len after buffer len check
batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
vhost/test: fix build for vhost test - again
vhost/test: fix build for vhost test
drm/vmwgfx: Fix double free in vmw_recv_msg()
sched/fair: Don't assign runtime for throttled cfs_rq
ALSA: hda/realtek - Fix the problem of two front mics on a ThinkCentre
ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL
ALSA: hda/realtek - Add quirk for HP Pavilion 15
ALSA: hda/realtek - Fix overridden device-specific initialization
ALSA: hda - Fix potential endless loop at applying quirks
ANDROID: regression introduced override_creds=off
Change-Id: I74d7d497d9ec70b589ac400f62e266858ee7893d
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
[ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ]
After commit 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via
blk_insert_cloned_request feedback"), map_request() will requeue the tio
when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
Thus, if device driver status is error, a tio may be requeued multiple
times until the return value is not DM_MAPIO_REQUEUE. That means
type->start_io may be called multiple times, while type->end_io is only
called when IO complete.
In fact, even without commit 396eaf21ee, setup_clone() failure can
also cause tio requeue and associated missed call to type->end_io.
The service-time path selector selects path based on in_flight_size,
which is increased by st_start_io() and decreased by st_end_io().
Missed calls to st_end_io() can lead to in_flight_size count error and
will cause the selector to make the wrong choice. In addition,
queue-length path selector will also be affected.
To fix the problem, call type->end_io in ->release_clone_rq before tio
requeue. map_info is passed to ->release_clone_rq() for map_request()
error path that result in requeue.
Fixes: 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Cc: stable@vger.kernl.org
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
* refs/heads/tmp-bb418a1:
Linux 4.19.31
s390/setup: fix boot crash for machine without EDAT-1
bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata
KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
KVM: nVMX: Apply addr size mask to effective address for VMX instructions
KVM: nVMX: Sign extend displacements of VMX instr's mem operands
KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux
KVM: x86/mmu: Detect MMIO generation wrap in any address space
KVM: Call kvm_arch_memslots_updated() before updating memslots
drm/amd/display: don't call dm_pp_ function from an fpu block
drm/amd/powerplay: correct power reading on fiji
drm/radeon/evergreen_cs: fix missing break in switch statement
drm/fb-helper: generic: Fix drm_fbdev_client_restore()
media: imx: csi: Stop upstream before disabling IDMA channel
media: imx: csi: Disable CSI immediately after last EOF
media: vimc: Add vimc-streamer for stream control
media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
media: lgdt330x: fix lock status reporting
media: imx: prpencvf: Stop upstream before disabling IDMA channel
rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
tpm: Unify the send callback behaviour
tpm/tpm_crb: Avoid unaligned reads in crb_recv()
md: Fix failed allocation of md_register_thread
perf intel-pt: Fix divide by zero when TSC is not available
perf/x86/intel/uncore: Fix client IMC events return huge result
perf intel-pt: Fix overlap calculation for padding
perf auxtrace: Define auxtrace record alignment
perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols
perf intel-pt: Fix CYC timestamp calculation after OVF
x86/unwind/orc: Fix ORC unwind table alignment
vt: perform safe console erase in the right order
stable-kernel-rules.rst: add link to networking patch queue
bcache: never writeback a discard operation
PM / wakeup: Rework wakeup source timer cancellation
svcrpc: fix UDP on servers with lots of threads
NFSv4.1: Reinitialise sequence results before retransmitting a request
nfsd: fix wrong check in write_v4_end_grace()
nfsd: fix memory corruption caused by readdir
nfsd: fix performance-limiting session calculation
NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
NFS: Fix an I/O request leakage in nfs_do_recoalesce
NFS: Fix I/O request leakages
cpcap-charger: generate events for userspace
mfd: sm501: Fix potential NULL pointer dereference
dm integrity: limit the rate of error messages
dm: fix to_sector() for 32bit
ipmi_si: fix use-after-free of resource->name
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
arm64: debug: Ensure debug handlers check triggering exception level
arm64: Fix HCR.TGE status for NMI contexts
ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
powerpc/traps: Fix the message printed when stack overflows
powerpc/traps: fix recoverability of machine check handling on book3s/32
powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit
powerpc/83xx: Also save/restore SPRG4-7 during suspend
powerpc/powernv: Make opal log only readable by root
powerpc/wii: properly disable use of BATs when requested.
powerpc/32: Clear on-stack exception marker upon exception return
security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock
selinux: add the missing walk_size + len check in selinux_sctp_bind_connect
jbd2: fix compile warning when using JBUFFER_TRACE
jbd2: clear dirty flag when revoking a buffer from an older transaction
serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
serial: 8250_pci: Fix number of ports for ACCES serial cards
serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart
serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO
bpf: only test gso type on gso packets
drm/i915: Relax mmap VMA check
can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument
gpio: pca953x: Fix dereference of irq data in shutdown
media: i2c: ov5640: Fix post-reset delay
i2c: tegra: fix maximum transfer size
parport_pc: fix find_superio io compare code, should use equal test.
intel_th: Don't reference unassigned outputs
device property: Fix the length used in PROPERTY_ENTRY_STRING()
kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
mm/vmalloc: fix size check for remap_vmalloc_range_partial()
mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
usb: typec: tps6598x: handle block writes separately with plain-I2C adapters
usb: chipidea: tegra: Fix missed ci_hdrc_remove_device()
clk: ingenic: Fix doc of ingenic_cgu_div_info
clk: ingenic: Fix round_rate misbehaving with non-integer dividers
clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override
clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure
clk: clk-twl6040: Fix imprecise external abort for pdmclk
clk: uniphier: Fix update register for CPU-gear
ext2: Fix underflow in ext2_max_size()
cxl: Wrap iterations over afu slices inside 'afu_list_lock'
IB/hfi1: Close race condition on user context disable and close
PCI: dwc: skip MSI init if MSIs have been explicitly disabled
PCI/DPC: Fix print AER status in DPC event handling
PCI/ASPM: Use LTR if already enabled by platform
ext4: fix crash during online resizing
ext4: add mask of ext4 flags to swap
ext4: update quota information while swapping boot loader inode
ext4: cleanup pagecache before swap i_data
ext4: fix check of inode in swap_inode_boot_loader
cpufreq: pxa2xx: remove incorrect __init annotation
cpufreq: tegra124: add missing of_node_put()
cpufreq: kryo: Release OPP tables on module removal
x86/kprobes: Prohibit probing on optprobe template code
irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table
libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer
soc: qcom: rpmh: Avoid accessing freed memory from batch API
Btrfs: fix corruption reading shared and compressed extents after hole punching
btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl
Btrfs: setup a nofs context for memory allocation at btrfs_create_tree()
m68k: Add -ffreestanding to CFLAGS
ovl: Do not lose security.capability xattr over metadata file copy-up
ovl: During copy up, first copy up data and then xattrs
splice: don't merge into linked buffers
fs/devpts: always delete dcache dentry-s in dput()
scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware
scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
scsi: sd: Optimal I/O size should be a multiple of physical block size
scsi: aacraid: Fix performance issue on logical drives
scsi: virtio_scsi: don't send sc payload with tmfs
s390/virtio: handle find on invalid queue gracefully
s390/setup: fix early warning messages
clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
regulator: s2mpa01: Fix step values for some LDOs
regulator: max77620: Initialize values for DT properties
regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
spi: pxa2xx: Setup maximum supported DMA transfer length
spi: ti-qspi: Fix mmap read when more than one CS in use
netfilter: ipt_CLUSTERIP: fix warning unused variable cn
mmc:fix a bug when max_discard is 0
mmc: sdhci-esdhc-imx: fix HS400 timing issue
ACPI / device_sysfs: Avoid OF modalias creation for removed device
xen: fix dom0 boot on huge systems
tracing/perf: Use strndup_user() instead of buggy open-coded version
tracing: Do not free iter->trace in fail path of tracing_open_pipe()
tracing: Use strncpy instead of memcpy for string keys in hist triggers
CIFS: Fix read after write for files with read caching
CIFS: Do not skip SMB2 message IDs on send failures
CIFS: Do not reset lease state to NONE on lease break
crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routine
crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP
crypto: x86/aesni-gcm - fix crash on empty plaintext
crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP
crypto: testmgr - skip crc32c context test for ahash algorithms
crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
crypto: pcbc - remove bogus memcpy()s with src == dest
crypto: morus - fix handling chunked inputs
crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
crypto: arm64/crct10dif - revert to C code for short inputs
crypto: arm64/aes-neonbs - fix returning final keystream block
crypto: arm/crct10dif - revert to C code for short inputs
crypto: aegis - fix handling chunked inputs
crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
fix cgroup_do_mount() handling of failure exits
libnvdimm: Fix altmap reservation size calculation
libnvdimm/pmem: Honor force_raw for legacy pmem regions
libnvdimm, pfn: Fix over-trim in trim_pfn_device()
libnvdimm/label: Clear 'updating' flag after label-set update
nfit/ars: Attempt short-ARS even in the no_init_ars case
nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
acpi/nfit: Fix bus command validation
nfit: acpi_nfit_ctl(): Check out_obj->type in the right place
stm class: Prevent division by zero
tmpfs: fix uninitialized return value in shmem_link
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task
net: thunderx: make CFG_DONE message to run through generic send-ack sequence
bpf, lpm: fix lookup bug in map_delete_elem
mac80211_hwsim: propagate genlmsg_reply return code
phonet: fix building with clang
ARCv2: don't assume core 0x54 has dual issue
ARCv2: support manual regfile save on interrupts
ARC: uacces: remove lp_start, lp_end from clobber list
ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN
tmpfs: fix link accounting when a tmpfile is linked in
mm: handle lru_add_drain_all for UP properly
net: marvell: mvneta: fix DMA debug warning
ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
arm64: Relax GIC version check during early boot
ARM: dts: armada-xp: fix Armada XP boards NAND description
qed: Fix iWARP syn packet mac address validation.
qed: Fix iWARP buffer size provided for syn packet processing.
ASoC: topology: free created components in tplg load error
mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue
xfrm: Fix inbound traffic via XFRM interfaces across network namespaces
net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
qmi_wwan: apply SET_DTR quirk to Sierra WP7607
pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
net: dsa: bcm_sf2: Do not assume DSA master supports WoL
net: systemport: Fix reception of BPDUs
scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
keys: Fix dependency loop between construction record and auth key
assoc_array: Fix shortcut creation
ARM: 8835/1: dma-mapping: Clear DMA ops on teardown
af_key: unconditionally clone on broadcast
bpf: fix lockdep false positive in stackmap
bpf: only adjust gso_size on bytestream protocols
ARM: 8824/1: fix a migrating irq bug when hotplug cpu
esp: Skip TX bytes accounting when sending from a request socket
clk: sunxi: A31: Fix wrong AHB gate number
kallsyms: Handle too long symbols in kallsyms.c
clk: sunxi-ng: v3s: Fix TCON reset de-assert bit
Input: st-keyscan - fix potential zalloc NULL dereference
auxdisplay: ht16k33: fix potential user-after-free on module unload
i2c: bcm2835: Clear current buffer pointers and counts after a transfer
i2c: cadence: Fix the hold bit setting
net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
x86/CPU: Add Icelake model number
net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
Revert "mm: use early_pfn_to_nid in page_ext_init"
mm/gup: fix gup_pmd_range() for dax
NFS: Don't use page_file_mapping after removing the page
xprtrdma: Make sure Send CQ is allocated on an existing compvec
floppy: check_events callback should not return a negative number
ipvs: fix dependency on nf_defrag_ipv6
blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
netfilter: compat: initialize all fields in xt_init
mac80211: Fix Tx aggregation session tear down with ITXQs
mac80211: call drv_ibss_join() on restart
Input: matrix_keypad - use flush_delayed_work()
Input: ps2-gpio - flush TX work when closing port
Input: cap11xx - switch to using set_brightness_blocking()
ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
ASoC: samsung: Prevent clk_get_rate() calls in atomic context
KVM: arm64: Forbid kprobing of the VHE world-switch code
KVM: arm/arm64: vgic: Always initialize the group of private IRQs
arm/arm64: KVM: Don't panic on failure to properly reset system registers
arm/arm64: KVM: Allow a VCPU to fully reset itself
KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
ARM: dts: Configure clock parent for pwm vibra
Input: pwm-vibra - stop regulator after disabling pwm, not before
Input: pwm-vibra - prevent unbalanced regulator
s390/dasd: fix using offset into zero size array error
arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
gpu: ipu-v3: Fix CSI offsets for imx53
drm/imx: imx-ldb: add missing of_node_puts
gpu: ipu-v3: Fix i.MX51 CSI control registers offset
drm/imx: ignore plane updates on disabled crtcs
crypto: rockchip - update new iv to device in multiple operations
crypto: rockchip - fix scatterlist nents error
crypto: ahash - fix another early termination in hash walk
crypto: cfb - remove bogus memcpy() with src == dest
crypto: cfb - add missing 'chunksize' property
crypto: ccree - don't copy zero size ciphertext
crypto: ccree - unmap buffer before copying IV
crypto: ccree - fix free of unallocated mlli buffer
crypto: caam - fix DMA mapping of stack memory
crypto: caam - fixed handling of sg list
crypto: ccree - fix missing break in switch statement
crypto: caam - fix hash context DMA unmap size
stm class: Fix an endless loop in channel allocation
mei: bus: move hw module get/put to probe/release
mei: hbm: clean the feature flags on link reset
iio: adc: exynos-adc: Fix NULL pointer exception on unbind
ASoC: codecs: pcm186x: Fix energysense SLEEP bit
ASoC: codecs: pcm186x: fix wrong usage of DECLARE_TLV_DB_SCALE()
ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
9p/net: fix memory leak in p9_client_create
9p: use inode->i_lock to protect i_size_write() under 32-bit
media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
ANDROID: cuttlefish_defconfig: Enable CONFIG_INPUT_MOUSEDEV
FROMLIST: psi: introduce psi monitor
FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h
FROMLIST: psi: track changed states
FROMLIST: psi: split update_stats into parts
FROMLIST: psi: rename psi fields in preparation for psi trigger addition
FROMLIST: psi: make psi_enable static
FROMLIST: psi: introduce state_mask to represent stalled psi states
ANDROID: cuttlefish_defconfig: Enable CONFIG_PSI
UPSTREAM: kernel: cgroup: add poll file operation
UPSTREAM: fs: kernfs: add poll file operation
UPSTREAM: psi: avoid divide-by-zero crash inside virtual machines
UPSTREAM: psi: clarify the Kconfig text for the default-disable option
UPSTREAM: psi: fix aggregation idle shut-off
UPSTREAM: psi: fix reference to kernel commandline enable
UPSTREAM: psi: make disabling/enabling easier for vendor kernels
UPSTREAM: kernel/sched/psi.c: simplify cgroup_move_task()
UPSTREAM: psi: cgroup support
UPSTREAM: psi: pressure stall information for CPU, memory, and IO
UPSTREAM: sched: introduce this_rq_lock_irq()
UPSTREAM: sched: sched.h: make rq locking and clock functions available in stats.h
UPSTREAM: sched: loadavg: make calc_load_n() public
BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
UPSTREAM: delayacct: track delays from thrashing cache pages
UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing
Conflicts:
arch/arm/kernel/irq.c
drivers/scsi/sd.c
include/linux/sched.h
init/Kconfig
kernel/sched/Makefile
kernel/sched/sched.h
kernel/workqueue.c
sound/soc/soc-dapm.c
Change-Id: Ia2dcc01c712134c57037ca6788d51172f66bcd93
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.
A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
is more correct.
Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.
Fixes: 0cf4503174 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+)
Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is needed for AVB.
Bug: None
Test: Compiles.
Change-Id: I45b5d435652ab66ec07420ab17f2c7889f7e4d95
Signed-off-by: David Zeuthen <zeuthen@google.com>
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.
1. dm: allow a dm-fs-style device to be shared via dm-ioctl
Integrates feedback from Alisdair, Mike, and Kiyoshi.
Two main changes occur here:
- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table. This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.
- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.
2. init: boot to device-mapper targets without an initr*
Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md. It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise). It also replaces /dev/XXX calls with major:minor opportunistically.
The format is dm="name uuid ro,table line 1,table line 2,...". The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space. Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).
A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.
An example dm-linear root with no uuid may look like:
root=/dev/dm-0 dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"
Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.
Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2http://marc.info/?l=dm-devel&m=127429499422096&w=2http://marc.info/?l=dm-devel&m=127429493922000&w=2
Latest upstream threads:
https://patchwork.kernel.org/patch/104859/https://patchwork.kernel.org/patch/104860/https://patchwork.kernel.org/patch/104861/
Bug: 27175947
Signed-off-by: Will Drewry <wad@chromium.org>
Review URL: http://codereview.chromium.org/2020011
Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
[AmitP: Refactored the original changes based on upstream changes,
commit e52347bd66 ("Documentation/admin-guide: split the kernel parameter list to a separate file")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull device mapper updates from Mike Snitzer:
- DM core passthrough ioctl fix to retain reference to DM table, and
that table's block devices, while issuing the ioctl to one of those
block devices.
- DM core passthrough ioctl fix to _not_ override the fmode_t used to
issue the ioctl. Overriding by using the fmode_t that the block
device was originally open with during DM table load is a liability.
- Add DM core support for secure erase forwarding and update the DM
linear and DM striped targets to support them.
- A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
same, write zeroes) for targets that make use of the non-splitting IO
variant (as is done for multipath or thinp when layered directly on
NVMe).
- Allow DM targets to return a payload in response to a DM message that
they are sent. This is useful for DM targets that would like to
provide statistics data in response to DM messages.
- Update DM bufio to support non-power-of-2 block sizes. Numerous other
related changes prepare the DM bufio code for this support.
- Fix DM crypt to use a bounded amount of memory across the entire
system. This is to avoid OOM that can otherwise occur in response to
certain pathological IO workloads (e.g. discarding a large DM crypt
device).
- Add a 'check_at_most_once' feature to the DM verity target to allow
verity to be used on mobile devices that have very limited resources.
- Fix the DM integrity target to fail early if a keyed algorithm (e.g.
HMAC) is to be used but the key isn't set.
- Add non-power-of-2 support to the DM unstripe target.
- Eliminate the use of a Variable Length Array in the DM stripe target.
- Update the DM log-writes target to record metadata (REQ_META flag).
- DM raid fixes for its nosync status and some variable range issues.
* tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
dm: remove fmode_t argument from .prepare_ioctl hook
dm: hold DM table for duration of ioctl rather than use blkdev_get
dm raid: fix parse_raid_params() variable range issue
dm verity: make verity_for_io_block static
dm verity: add 'check_at_most_once' option to only validate hashes once
dm bufio: don't embed a bio in the dm_buffer structure
dm bufio: support non-power-of-two block sizes
dm bufio: use slab cache for dm_buffer structure allocations
dm bufio: reorder fields in dm_buffer structure
dm bufio: relax alignment constraint on slab cache
dm bufio: remove code that merges slab caches
dm bufio: get rid of slab cache name allocations
dm bufio: move dm-bufio.h to include/linux/
dm bufio: delete outdated comment
dm: add support for secure erase forwarding
dm: backfill abnormal IO support to non-splitting IO submission
dm raid: fix nosync status
dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
dm stripe: get rid of a Variable Length Array (VLA)
dm log writes: record metadata flag for better flags record
...
Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.
All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
data devices support secure erase.
Also, add support for secure erase to both the linear and striped
targets.
Signed-off-by: Denis Semakin <d.semakin@omprussia.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.
Signed-off-By: Mike Snitzer <snitzer@redhat.com>
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.
Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.
Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's
multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue.
This is beneficial to do if BLK_STS_RESOURCE is returned from the target
(because target is busy).
Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(),
indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay.
For old .request_fn: use blk_delay_queue().
bio-based multipath doesn't have feature parity with request-based for
retryable error requeues; that is something that'll need fixing in the
future.
Suggested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
[as interpreted from Bart's "... patch looks fine to me."]
If anyone is going to use dm_table_create(), they probably should be
able to use dm_table_destroy() too. Move the dm_table_destroy()
definition outside the private header, near dm_table_create()
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions. Also,
the table has a single immutable target that doesn't require DM core to
split bios.
This will enable adding NVMe optimizations to bio-based DM.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Eliminates need for a separate mempool to allocate 'struct dm_io'
objects from. As such, it saves an extra mempool allocation for each
original bio that DM core is issued.
This complicates the per-bio-data accessor functions by needing to
conditonally add extra padding to get to a target's per-bio-data. But
in the end this provides a decent performance improvement for all
bio-based DM devices.
On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
No DM target provides num_write_bios and none has since dm-cache's
brief use in 2013.
Having the possibility of num_write_bios > 1 complicates bio
allocation. So remove the interface and assume there is only one bio
needed.
If a target ever needs more, it must provide a suitable bioset and
allocate itself based on its particular needs.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit abebfbe2f7 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.
It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().
It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.
Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.
Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.
(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.
Fixes: 71a16736a1 ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull libnvdimm updates from Dan Williams:
"libnvdimm updates for the latest ACPI and UEFI specifications. This
pull request also includes new 'struct dax_operations' enabling to
undo the abuse of copy_user_nocache() for copy operations to pmem.
The dax work originally missed 4.12 to address concerns raised by Al.
Summary:
- Introduce the _flushcache() family of memory copy helpers and use
them for persistent memory write operations on x86. The
_flushcache() semantic indicates that the cache is either bypassed
for the copy operation (movnt) or any lines dirtied by the copy
operation are written back (clwb, clflushopt, or clflush).
- Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache
- Add support for the new NVDIMM platform/firmware mechanisms
introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
namespace label format, extensions to the address-range-scrub
command set, new error injection commands, and a new BTT
(block-translation-table) layout. These updates support inter-OS
and pre-OS compatibility.
- Fix a longstanding memory corruption bug in nfit_test.
- Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.
- Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.
Acknowledgements that came after the branch was pushed: commit
6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
<toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
libnvdimm, namespace: record 'lbasize' for pmem namespaces
acpi/nfit: Issue Start ARS to retrieve existing records
libnvdimm: New ACPI 6.2 DSM functions
acpi, nfit: Show bus_dsm_mask in sysfs
libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
acpi, nfit: Enable DSM pass thru for root functions.
libnvdimm: passthru functions clear to send
libnvdimm, btt: convert some info messages to warn/err
libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
libnvdimm: fix the clear-error check in nsio_rw_bytes
libnvdimm, btt: fix btt_rw_page not returning errors
acpi, nfit: quiet invalid block-aperture-region warnings
libnvdimm, btt: BTT updates for UEFI 2.7 format
acpi, nfit: constify *_attribute_group
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
libnvdimm, pmem, dax: export a cache control attribute
dax: convert to bitmask for flags
dax: remove default copy_from_iter fallback
libnvdimm, nfit: enable support for volatile ranges
libnvdimm, pmem: fix persistence warning
...
A target driver support zoned block devices and exposing it as such may
receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
device zone configuration. To process properly such request, the target
driver may need to remap the zone descriptors provided in the report
reply. The helper function dm_remap_zone_report() does this generically
using only the target start offset and length and the start offset
within the target device.
dm_remap_zone_report() will remap the start sector of all zones
reported. If the report includes sequential zones, the write pointer
position of these zones will also be remapped.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
1) Introduce DM_TARGET_ZONED_HM feature flag:
The target drivers currently available will not operate correctly if a
table target maps onto a host-managed zoned block device.
To avoid problems, introduce the new feature flag DM_TARGET_ZONED_HM to
allow a target to explicitly state that it supports host-managed zoned
block devices. This feature is checked for all targets in a table if
any of the table's block devices are host-managed.
Note that as host-aware zoned block devices are backward compatible with
regular block devices, they can be used by any of the current target
types. This new feature is thus restricted to host-managed zoned block
devices.
2) Check device area zone alignment:
If a target maps to a zoned block device, check that the device area is
aligned on zone boundaries to avoid problems with REQ_OP_ZONE_RESET
operations (resetting a partially mapped sequential zone would not be
possible). This also facilitates the processing of zone report with
REQ_OP_ZONE_REPORT bios.
3) Check block devices zone model compatibility
When setting the DM device's queue limits, several possibilities exists
for zoned block devices:
1) The DM target driver may want to expose a different zone model
(e.g. host-managed device emulation or regular block device on top of
host-managed zoned block devices)
2) Expose the underlying zone model of the devices as-is
To allow both cases, the underlying block device zone model must be set
in the target limits in dm_set_device_limits() and the compatibility of
all devices checked similarly to the logical block size alignment. For
this last check, introduce validate_hardware_zoned_model() to check that
all targets of a table have the same zone model and that the zone size
of the target devices are equal.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer refactored Damien's original work to simplify the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Using pr_<level> is the more common logging style.
Standardize style and use new macro DM_FMT.
Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.
This conceptually allows for an array of mixed device drivers with
varying flush implementations.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.
This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull libnvdimm updates from Dan Williams:
"The bulk of this has been in multiple -next releases. There were a few
late breaking fixes and small features that got added in the last
couple days, but the whole set has received a build success
notification from the kbuild robot.
Change summary:
- Region media error reporting: A libnvdimm region device is the
parent to one or more namespaces. To date, media errors have been
reported via the "badblocks" attribute attached to pmem block
devices for namespaces in "raw" or "memory" mode. Given that
namespaces can be in "device-dax" or "btt-sector" mode this new
interface reports media errors generically, i.e. independent of
namespace modes or state.
This subsequently allows userspace tooling to craft "ACPI 6.1
Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
requests and submit them via the ioctl path for NVDIMM root bus
devices.
- Introduce 'struct dax_device' and 'struct dax_operations': Prompted
by a request from Linus and feedback from Christoph this allows for
dax capable drivers to publish their own custom dax operations.
This fixes the broken assumption that all dax operations are
related to a persistent memory device, and makes it easier for
other architectures and platforms to add customized persistent
memory support.
- 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger
memory controllers to drain write-pending buffers that would
otherwise be flushed automatically by the platform ADR
(asynchronous-DRAM-refresh) mechanism at a power loss event.
Support for "locked" DIMMs is included to prevent namespaces from
surfacing when the namespace label data area is locked. Finally,
fixes for various reported deadlocks and crashes, also tagged for
-stable.
- ACPI / nfit driver updates: General updates of the nfit driver to
add DSM command overrides, ACPI 6.1 health state flags support, DSM
payload debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
- commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
Tested-by: Yi Zhang <yizhan@redhat.com>
- commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
libnvdimm, pfn: fix 'npfns' vs section alignment
libnvdimm: handle locked label storage areas
libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
brd: fix uninitialized use of brd->dax_dev
block, dax: use correct format string in bdev_dax_supported
device-dax: fix sysfs attribute deadlock
libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
libnvdimm: rework region badblocks clearing
acpi, nfit: kill ACPI_NFIT_DEBUG
libnvdimm: fix clear length of nvdimm_forget_poison()
libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
libnvdimm, region: sysfs trigger for nvdimm_flush()
libnvdimm: fix phys_addr for nvdimm_clear_poison
x86, dax, pmem: remove indirection around memcpy_from_pmem()
block: remove block_device_operations ->direct_access()
block, dax: convert bdev_dax_supported() to dax_direct_access()
filesystem-dax: convert to dax_direct_access()
Revert "block: use DAX for partition table reads"
ext2, ext4, xfs: retrieve dax_device for iomap operations
...
This untangles the DM_MAPIO_* values returned from ->clone_and_map_rq
from the error codes used by the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Instead of returning either a DM_ENDIO_* constant or an error code, add
a new DM_ENDIO_DONE value that means keep errno as is. This allows us
to easily keep the existing error code in case where we can't push back,
and it also preparares for the new block level status codes with strict
type checking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Introduce an enumeration type for the queue mode. This patch does
not change any functionality but makes the DM code easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Arrange for dm to lookup the dax services available from member devices.
Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device. Changes the target-internal
->direct_access() method to more closely align with the dax_operations
->direct_access() calling convention.
Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A dm-crypt on dm-integrity device incorrectly advertises an integrity
profile on the DM crypt device. It can be seen in the files
"/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target
advertise the integrity profile. That is incorrect, only the
dm-integrity target should advertise the integrity profile.
A general problem in DM is that if we have a DM device that depends on
another device with an integrity profile, the upper device will always
advertise the integrity profile, even when the target driver doesn't
support handling integrity data.
Most targets don't support integrity data, so we provide a whitelist of
targets that support it (linear, delay and striped). The targets that
support passing integrity data to the lower device are marked with the
flag DM_TARGET_PASSES_INTEGRITY. The DM core will now advertise
integrity data on a DM device only if all the targets support the
integrity data.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allocate a dax_device to represent the capacity of a device-mapper
instance. Provide a ->direct_access() method via the new dax_operations
indirection that mirrors the functionality of the current direct_access
support via block_device_operations. Once fs/dax.c has been converted
to use dax_operations the old dm_blk_direct_access() will be removed.
A new helper dm_dax_get_live_target() is introduced to separate some of
the dm-specifics from the direct_access implementation.
This enabling is only for the top-level dm representation to upper
layers. Converting target direct_access implementations is deferred to a
separate patch.
Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add DM_TARGET_INTEGRITY flag that specifies bio integrity metadata is
not inherited but implemented in the target itself.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device. But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers. Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Otherwise blk-mq will immediately dispatch requests that are requeued
via a BLK_MQ_RQ_QUEUE_BUSY return from blk_mq_ops .queue_rq.
Delayed requeue is implemented using blk_mq_delay_kick_requeue_list()
with a delay of 5 secs. In the context of DM multipath (all paths down)
it doesn't make any sense to requeue more quickly.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...
Change mapped device to implement direct_access function,
dm_blk_direct_access(), which calls a target direct_access function.
'struct target_type' is extended to have target direct_access interface.
This function limits direct accessible size to the dm_target's limit
with max_io_len().
Add dm_table_supports_dax() to iterate all targets and associated block
devices to check for DAX support. To add DAX support to a DM target the
target must only implement the direct_access function.
Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
device supports DAX and is bio based. This new type is used to assure
that all target devices have DAX support and remain that way after
QUEUE_FLAG_DAX is set in mapped device.
At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
DM_TYPE_DAX_BIO_BASED to the type. Any subsequent table load to the
mapped device must have the same type, or else it fails per the check in
table_load().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allow a user to specify an optional feature 'queue_mode <mode>' where
<mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
request_fn rq-based, and blk-mq rq-based respectively.
If the queue_mode feature isn't specified the default for the
"multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
it'll default to mode "mq".
This new queue_mode feature introduces the ability for each multipath
device to have its own queue_mode (whereas before this feature all
multipath devices effectively had to have the same queue_mode).
This commit also goes a long way to eliminate the awkward (ab)use of
DM_TYPE_*, the associated filter_md_type() and other relatively fragile
and difficult to maintain code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The DM_TARGET_WILDCARD feature indicates that the "error" target may
replace any target; even immutable targets. This feature will be useful
to preserve the ability to replace the "multipath" target even once it
is formally converted over to having the DM_TARGET_IMMUTABLE feature.
Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
.map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
in the target_type.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This moves the call to blkdev_ioctl and the argument checking to DM core
code, and only leaves a callout to find the block device to operate on
in the targets. This simplifies the code and allows us to pass through
ioctl-like command using other methods in the next patch.
Also split out a helper around calling the prepare_ioctl method that
will be reused for persistent reservation handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>