Commit Graph

140 Commits

Author SHA1 Message Date
Eric Biggers
a680071d40 ANDROID: dm: add support for passing through inline crypto support
Update the device-mapper core to support exposing the inline crypto
support of the underlying device(s) through the device-mapper device.

This works by creating a "passthrough keyslot manager" for the dm
device, which declares support for the set of (crypto_mode,
data_unit_size) combos which all the underlying devices support.  When a
supported combo is used, the bio cloning code handles cloning the crypto
context to the bios for all the underlying devices.  When an unsupported
combo is used, the blk-crypto fallback is used as usual.

Crypto support on each underlying device is ignored unless the
corresponding dm target opts into exposing it.  This is needed because
for inline crypto to semantically operate on the original bio, the data
must not be transformed by the dm target.  Thus, targets like dm-linear
can expose crypto support of the underlying device, but targets like
dm-crypt can't.  (dm-crypt could use inline crypto itself, though.)

When a key is evicted from the dm device, it is evicted from all
underlying devices.

Bug: 137270441
Bug: 147814592
Change-Id: If28b574f2e28268db5eb9f325d4cf8f96cb63e3f
Signed-off-by: Eric Biggers <ebiggers@google.com>
(cherry picked from commit 44e1174c18)
2020-05-30 02:04:30 +08:00
Ivaylo Georgiev
0e5107f00a Merge android-4.19-q.73 (8feec99) into msm-4.19
* refs/heads/tmp-8feec99:
  Documentation: devicetree: Remove Exynos bindings from kernel
  Documentation: devicetree: Remove Armadeus bindings from kernel
  Revert "dt-bindings: mmc: Add supports-cqe property"
  Revert "dt-bindings: mmc: Add disable-cqe-dcmd property."
  Linux 4.19.73
  vhost: make sure log_num < in_num
  powerpc/tm: Fix restoring FP/VMX facility incorrectly on interrupts
  powerpc/tm: Remove msr_tm_active()
  PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
  ext4: unsigned int compared against zero
  ext4: fix block validity checks for journal inodes using indirect blocks
  ext4: don't perform block validity checks on the journal inode
  drm/atomic_helper: Allow DPMS On<->Off changes for unregistered connectors
  virtio/s390: fix race on airq_areas[]
  drm/i915: Make sure cdclk is high enough for DP audio on VLV/CHV
  bcache: fix race in btree_flush_write()
  bcache: add comments for mutex_lock(&b->write_lock)
  bcache: only clear BTREE_NODE_dirty bit when it is set
  NFSv4: Fix delegation state recovery
  iio: adc: gyroadc: fix uninitialized return code
  mm/migrate.c: initialize pud_entry in migrate_vma()
  i2c: at91: fix clk_offset for sama5d2
  i2c: at91: disable TXRDY interrupt after sending data
  gpio: don't WARN() on NULL descs if gpiolib is disabled
  iommu/iova: Remove stale cached32_node
  powerpc/mm: Limit rma_size to 1TB when running without HV mode
  ALSA: hda - Fix intermittent CORB/RIRB stall on Intel chips
  drm/panel: Add support for Armadeus ST0700 Adapt
  dm thin metadata: check if in fail_io mode when setting needs_check
  pstore: Fix double-free in pstore_mkfile() failure path
  resource: fix locking in find_next_iomem_res()
  resource: Fix find_next_iomem_res() iteration issue
  resource: Include resource end in walk_*() interfaces
  btrfs: correctly validate compression type
  RDMA/srp: Accept again source addresses that do not have a port number
  RDMA/srp: Document srp_parse_in() arguments
  ARM: dts: gemini: Set DIR-685 SPI CS as active low
  KVM: PPC: Book3S HV: Fix CR0 setting in TM emulation
  KVM: PPC: Use ccr field in pt_regs struct embedded in vcpu struct
  KVM: VMX: check CPUID before allowing read/write of IA32_XSS
  KVM: VMX: Fix handling of #MC that occurs during VM-Entry
  KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value
  KVM: x86: optimize check for valid PAT value
  ceph: use ceph_evict_inode to cleanup inode's resource
  ALSA: hda - Don't resume forcibly i915 HDMI/DP codec
  cifs: Properly handle auto disabling of serverino option
  scsi: zfcp: fix request object use-after-free in send path causing wrong traces
  staging: wilc1000: fix error path cleanup in wilc_wlan_initialize()
  scsi: target/iblock: Fix overrun in WRITE SAME emulation
  scsi: target/core: Use the SECTOR_SHIFT constant
  apparmor: reset pos on failure to unpack for various functions
  IB/hfi1: Avoid hardlockup with flushlist_lock
  clk: tegra210: Fix default rates for HDA clocks
  clk: tegra: Fix maximum audio sync clock for Tegra124/210
  cifs: add spinlock for the openFileList to cifsInodeInfo
  Btrfs: fix race between block group removal and block group allocation
  drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc
  drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2)
  kvm: Check irqchip mode before assign irqfd
  drm/amdkfd: Add missing Polaris10 ID
  ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
  ARC: mm: fix uninitialised signal code in do_page_fault
  signal/arc: Use force_sig_fault where appropriate
  dm crypt: move detailed message into debug level
  cifs: smbd: take an array of reqeusts when sending upper layer data
  PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code
  mmc: sdhci-pci: Add support for Intel CML
  blk-mq: free hw queue's resource in hctx's release handler
  dm mpath: fix missing call of path selector type->end_io
  PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary
  PCI: Add macro for Switchtec quirk declarations
  dt-bindings: mmc: Add disable-cqe-dcmd property.
  dt-bindings: mmc: Add supports-cqe property
  ARM: dts: qcom: ipq4019: enlarge PCIe BAR range
  ARM: dts: qcom: ipq4019: Fix MSI IRQ type
  ARM: dts: qcom: ipq4019: fix PCI range
  ext4: protect journal inode's blocks using block_validity
  media: i2c: tda1997x: select V4L2_FWNODE
  cifs: Fix lease buffer length error
  KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
  x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
  kvm: mmu: Fix overflow on kvm mmu page limit calculation
  IB/mlx5: Reset access mask when looping inside page fault handler
  arm64: dts: stratix10: add the sysmgr-syscon property from the gmac's
  usb: typec: tcpm: Try PD-2.0 if sink does not respond to 3.0 source-caps
  drm/i915: Sanity check mmap length against object size
  drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set
  CIFS: Fix leaking locked VFS cache pages in writeback retry
  CIFS: Fix error paths in writeback code
  drm: add __user attribute to ptr_to_compat()
  PCI: qcom: Don't deassert reset GPIO during probe
  PCI: qcom: Fix error handling in runtime PM support
  btrfs: init csum_list before possible free
  btrfs: scrub: fix circular locking dependency warning
  btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
  btrfs: scrub: pass fs_info to scrub_setup_ctx
  mmc: renesas_sdhi: Fix card initialization failure in high speed mode
  powerpc/kvm: Save and restore host AMR/IAMR/UAMOR
  spi: spi-gpio: fix SPI_CS_HIGH capability
  x86/kvmclock: set offset for kvm unstable clock
  iwlwifi: add new card for 9260 series
  iwlwifi: fix devices with PCI Device ID 0x34F0 and 11ac RF modules
  drm/nouveau: Don't WARN_ON VCPI allocation failures
  mt76: fix corrupted software generated tx CCMP PN
  iio: adc: exynos-adc: Use proper number of channels for Exynos4x12
  dt-bindings: iio: adc: exynos-adc: Add S5PV210 variant
  iio: adc: exynos-adc: Add S5PV210 variant
  KVM: VMX: Compare only a single byte for VMCS' "launched" in vCPU-run
  bcache: treat stale && dirty keys as bad keys
  bcache: replace hard coded number with BUCKET_GC_GEN_MAX
  tpm: Fix some name collisions with drivers/char/tpm.h
  mfd: Kconfig: Fix I2C_DESIGNWARE_PLATFORM dependencies
  drm/i915/ilk: Fix warning when reading emon_status with no output
  drm/vblank: Allow dynamic per-crtc max_vblank_count
  crypto: ccree - add missing inline qualifier
  crypto: ccree - fix resume race condition on init
  IB/uverbs: Fix OOPs upon device disassociation
  ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault
  ARC: show_regs: lockdep: re-enable preemption
  media: vim2m: only cancel work if it is for right context
  btrfs: Use real device structure to verify dev extent
  btrfs: volumes: Make sure no dev extent is beyond device boundary
  powerpc/pkeys: Fix handling of pkey state across fork()
  scsi: megaraid_sas: Use 63-bit DMA addressing
  scsi: megaraid_sas: Add check for reset adapter bit
  scsi: megaraid_sas: Fix combined reply queue mode detection
  btrfs: Fix error handling in btrfs_cleanup_ordered_extents
  btrfs: Remove extent_io_ops::fill_delalloc
  Btrfs: fix deadlock with memory reclaim during scrub
  Btrfs: clean up scrub is_dev_replace parameter
  KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch
  drm/i915: Cleanup gt powerstate from gem
  drm/i915: Restore sane defaults for KMS on GEM error load
  media: vim2m: use cancel_delayed_work_sync instead of flush_schedule_work
  media: vim2m: use workqueue
  s390/zcrypt: reinit ap queue state machine during device probe
  ARM: davinci: dm644x: define gpio interrupts as separate resources
  ARM: davinci: dm355: define gpio interrupts as separate resources
  ARM: davinci: dm646x: define gpio interrupts as separate resources
  ARM: davinci: dm365: define gpio interrupts as separate resources
  ARM: davinci: da8xx: define gpio interrupts as separate resources
  drm/amd/dm: Understand why attaching path/tile properties are needed
  drm/amd/pp: Fix truncated clock value when set watermark
  powerplay: Respect units on max dcfclk watermark
  Drivers: hv: kvp: Fix the recent regression caused by incorrect clean-up
  Drivers: hv: kvp: Fix the indentation of some "break" statements
  drm/atomic_helper: Disallow new modesets on unregistered connectors
  drm/i915/gen9+: Fix initial readout for Y tiled framebuffers
  drm/i915: Rename PLANE_CTL_DECOMPRESSION_ENABLE
  drm/i915: Fix intel_dp_mst_best_encoder()
  x86/kvm/lapic: preserve gfn_to_hva_cache len on cache reinit
  KVM: hyperv: define VP assist page helpers
  KVM: x86: hyperv: keep track of mismatched VP indexes
  KVM: x86: hyperv: consistently use 'hv_vcpu' for 'struct kvm_vcpu_hv' variables
  KVM: x86: hyperv: enforce vp_index < KVM_MAX_VCPUS
  drm/amdgpu: Update gc_9_0 golden settings.
  drm/amdgpu/gfx9: Update gfx9 golden settings.
  remoteproc: qcom: q6v5-mss: add SCM probe dependency
  x86, hibernate: Fix nosave_regions setup for hibernation
  Drivers: hv: kvp: Fix two "this statement may fall through" warnings
  keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
  scsi: qla2xxx: Move log messages before issuing command to firmware
  media: cec: remove cec-edid.c
  media: cec/v4l2: move V4L2 specific CEC functions to V4L2
  drm/i915: Re-apply "Perform link quality check, unconditionally during long pulse"
  kernel/module: Fix mem leak in module_add_modinfo_attrs
  modules: always page-align module section allocations
  remoteproc: qcom: q6v5: shore up resource probe handling
  clk: s2mps11: Add used attribute to s2mps11_dt_match
  nvme-fc: use separate work queue to avoid warning
  riscv: remove unused variable in ftrace
  scripts/decode_stacktrace: match basepath using shell prefix operator, not regex
  arm64: dts: rockchip: enable usb-host regulators at boot on rk3328-rock64
  media: stm32-dcmi: fix irq = 0 case
  powerpc/64: mark start_here_multiplatform as __ref
  x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()
  selftests: fib_rule_tests: use pre-defined DEV_ADDR
  timekeeping: Use proper ktime_add when adding nsecs in coarse offset
  {nl,mac}80211: fix interface combinations on crypto controlled devices
  blk-iolatency: fix STS_AGAIN handling
  Blk-iolatency: warn on negative inflight IO counter
  hv_sock: Fix hang when a connection is closed
  batman-adv: Only read OGM tvlv_len after buffer len check
  batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
  powerpc/tm: Fix FP/VMX unavailable exceptions inside a transaction
  vhost/test: fix build for vhost test - again
  vhost/test: fix build for vhost test
  drm/vmwgfx: Fix double free in vmw_recv_msg()
  sched/fair: Don't assign runtime for throttled cfs_rq
  ALSA: hda/realtek - Fix the problem of two front mics on a ThinkCentre
  ALSA: hda/realtek - Enable internal speaker & headset mic of ASUS UX431FL
  ALSA: hda/realtek - Add quirk for HP Pavilion 15
  ALSA: hda/realtek - Fix overridden device-specific initialization
  ALSA: hda - Fix potential endless loop at applying quirks
  ANDROID: regression introduced override_creds=off

Change-Id: I74d7d497d9ec70b589ac400f62e266858ee7893d
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-09-18 00:41:18 -07:00
Yufen Yu
69409854ba dm mpath: fix missing call of path selector type->end_io
[ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ]

After commit 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via
blk_insert_cloned_request feedback"), map_request() will requeue the tio
when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.

Thus, if device driver status is error, a tio may be requeued multiple
times until the return value is not DM_MAPIO_REQUEUE.  That means
type->start_io may be called multiple times, while type->end_io is only
called when IO complete.

In fact, even without commit 396eaf21ee, setup_clone() failure can
also cause tio requeue and associated missed call to type->end_io.

The service-time path selector selects path based on in_flight_size,
which is increased by st_start_io() and decreased by st_end_io().
Missed calls to st_end_io() can lead to in_flight_size count error and
will cause the selector to make the wrong choice.  In addition,
queue-length path selector will also be affected.

To fix the problem, call type->end_io in ->release_clone_rq before tio
requeue.  map_info is passed to ->release_clone_rq() for map_request()
error path that result in requeue.

Fixes: 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Cc: stable@vger.kernl.org
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-16 08:22:12 +02:00
Ivaylo Georgiev
6f910c4e90 Merge android-4.19.31 (bb418a1) into msm-4.19
* refs/heads/tmp-bb418a1:
  Linux 4.19.31
  s390/setup: fix boot crash for machine without EDAT-1
  bcache: use (REQ_META|REQ_PRIO) to indicate bio for metadata
  KVM: nVMX: Ignore limit checks on VMX instructions using flat segments
  KVM: nVMX: Apply addr size mask to effective address for VMX instructions
  KVM: nVMX: Sign extend displacements of VMX instr's mem operands
  KVM: x86/mmu: Do not cache MMIO accesses while memslots are in flux
  KVM: x86/mmu: Detect MMIO generation wrap in any address space
  KVM: Call kvm_arch_memslots_updated() before updating memslots
  drm/amd/display: don't call dm_pp_ function from an fpu block
  drm/amd/powerplay: correct power reading on fiji
  drm/radeon/evergreen_cs: fix missing break in switch statement
  drm/fb-helper: generic: Fix drm_fbdev_client_restore()
  media: imx: csi: Stop upstream before disabling IDMA channel
  media: imx: csi: Disable CSI immediately after last EOF
  media: vimc: Add vimc-streamer for stream control
  media: uvcvideo: Avoid NULL pointer dereference at the end of streaming
  media: lgdt330x: fix lock status reporting
  media: imx: prpencvf: Stop upstream before disabling IDMA channel
  rcu: Do RCU GP kthread self-wakeup from softirq and interrupt
  tpm: Unify the send callback behaviour
  tpm/tpm_crb: Avoid unaligned reads in crb_recv()
  md: Fix failed allocation of md_register_thread
  perf intel-pt: Fix divide by zero when TSC is not available
  perf/x86/intel/uncore: Fix client IMC events return huge result
  perf intel-pt: Fix overlap calculation for padding
  perf auxtrace: Define auxtrace record alignment
  perf tools: Fix split_kallsyms_for_kcore() for trampoline symbols
  perf intel-pt: Fix CYC timestamp calculation after OVF
  x86/unwind/orc: Fix ORC unwind table alignment
  vt: perform safe console erase in the right order
  stable-kernel-rules.rst: add link to networking patch queue
  bcache: never writeback a discard operation
  PM / wakeup: Rework wakeup source timer cancellation
  svcrpc: fix UDP on servers with lots of threads
  NFSv4.1: Reinitialise sequence results before retransmitting a request
  nfsd: fix wrong check in write_v4_end_grace()
  nfsd: fix memory corruption caused by readdir
  nfsd: fix performance-limiting session calculation
  NFS: Don't recoalesce on error in nfs_pageio_complete_mirror()
  NFS: Fix an I/O request leakage in nfs_do_recoalesce
  NFS: Fix I/O request leakages
  cpcap-charger: generate events for userspace
  mfd: sm501: Fix potential NULL pointer dereference
  dm integrity: limit the rate of error messages
  dm: fix to_sector() for 32bit
  ipmi_si: fix use-after-free of resource->name
  arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
  arm64: debug: Ensure debug handlers check triggering exception level
  arm64: Fix HCR.TGE status for NMI contexts
  ARM: s3c24xx: Fix boolean expressions in osiris_dvs_notify
  powerpc/traps: Fix the message printed when stack overflows
  powerpc/traps: fix recoverability of machine check handling on book3s/32
  powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
  powerpc/ptrace: Simplify vr_get/set() to avoid GCC warning
  powerpc: Fix 32-bit KVM-PR lockup and host crash with MacOS guest
  powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit
  powerpc/83xx: Also save/restore SPRG4-7 during suspend
  powerpc/powernv: Make opal log only readable by root
  powerpc/wii: properly disable use of BATs when requested.
  powerpc/32: Clear on-stack exception marker upon exception return
  security/selinux: fix SECURITY_LSM_NATIVE_LABELS on reused superblock
  selinux: add the missing walk_size + len check in selinux_sctp_bind_connect
  jbd2: fix compile warning when using JBUFFER_TRACE
  jbd2: clear dirty flag when revoking a buffer from an older transaction
  serial: 8250_pci: Have ACCES cards that use the four port Pericom PI7C9X7954 chip use the pci_pericom_setup()
  serial: 8250_pci: Fix number of ports for ACCES serial cards
  serial: 8250_of: assume reg-shift of 2 for mrvl,mmp-uart
  serial: uartps: Fix stuck ISR if RX disabled with non-empty FIFO
  bpf: only test gso type on gso packets
  drm/i915: Relax mmap VMA check
  can: flexcan: FLEXCAN_IFLAG_MB: add () around macro argument
  gpio: pca953x: Fix dereference of irq data in shutdown
  media: i2c: ov5640: Fix post-reset delay
  i2c: tegra: fix maximum transfer size
  parport_pc: fix find_superio io compare code, should use equal test.
  intel_th: Don't reference unassigned outputs
  device property: Fix the length used in PROPERTY_ENTRY_STRING()
  kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv
  mm/memory.c: do_fault: avoid usage of stale vm_area_struct
  mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  mm: hwpoison: fix thp split handing in soft_offline_in_use_page()
  dmaengine: usb-dmac: Make DMAC system sleep callbacks explicit
  usb: typec: tps6598x: handle block writes separately with plain-I2C adapters
  usb: chipidea: tegra: Fix missed ci_hdrc_remove_device()
  clk: ingenic: Fix doc of ingenic_cgu_div_info
  clk: ingenic: Fix round_rate misbehaving with non-integer dividers
  clk: samsung: exynos5: Fix kfree() of const memory on setting driver_override
  clk: samsung: exynos5: Fix possible NULL pointer exception on platform_device_alloc() failure
  clk: clk-twl6040: Fix imprecise external abort for pdmclk
  clk: uniphier: Fix update register for CPU-gear
  ext2: Fix underflow in ext2_max_size()
  cxl: Wrap iterations over afu slices inside 'afu_list_lock'
  IB/hfi1: Close race condition on user context disable and close
  PCI: dwc: skip MSI init if MSIs have been explicitly disabled
  PCI/DPC: Fix print AER status in DPC event handling
  PCI/ASPM: Use LTR if already enabled by platform
  ext4: fix crash during online resizing
  ext4: add mask of ext4 flags to swap
  ext4: update quota information while swapping boot loader inode
  ext4: cleanup pagecache before swap i_data
  ext4: fix check of inode in swap_inode_boot_loader
  cpufreq: pxa2xx: remove incorrect __init annotation
  cpufreq: tegra124: add missing of_node_put()
  cpufreq: kryo: Release OPP tables on module removal
  x86/kprobes: Prohibit probing on optprobe template code
  irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
  irqchip/gic-v3-its: Avoid parsing _indirect_ twice for Device table
  libertas_tf: don't set URB_ZERO_PACKET on IN USB transfer
  soc: qcom: rpmh: Avoid accessing freed memory from batch API
  Btrfs: fix corruption reading shared and compressed extents after hole punching
  btrfs: ensure that a DUP or RAID1 block group has exactly two stripes
  Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl
  Btrfs: setup a nofs context for memory allocation at btrfs_create_tree()
  m68k: Add -ffreestanding to CFLAGS
  ovl: Do not lose security.capability xattr over metadata file copy-up
  ovl: During copy up, first copy up data and then xattrs
  splice: don't merge into linked buffers
  fs/devpts: always delete dcache dentry-s in dput()
  scsi: qla2xxx: Fix LUN discovery if loop id is not assigned yet by firmware
  scsi: target/iscsi: Avoid iscsit_release_commands_from_conn() deadlock
  scsi: sd: Optimal I/O size should be a multiple of physical block size
  scsi: aacraid: Fix performance issue on logical drives
  scsi: virtio_scsi: don't send sc payload with tmfs
  s390/virtio: handle find on invalid queue gracefully
  s390/setup: fix early warning messages
  clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
  clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
  clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
  regulator: s2mpa01: Fix step values for some LDOs
  regulator: max77620: Initialize values for DT properties
  regulator: s2mps11: Fix steps for buck7, buck8 and LDO35
  spi: pxa2xx: Setup maximum supported DMA transfer length
  spi: ti-qspi: Fix mmap read when more than one CS in use
  netfilter: ipt_CLUSTERIP: fix warning unused variable cn
  mmc:fix a bug when max_discard is 0
  mmc: sdhci-esdhc-imx: fix HS400 timing issue
  ACPI / device_sysfs: Avoid OF modalias creation for removed device
  xen: fix dom0 boot on huge systems
  tracing/perf: Use strndup_user() instead of buggy open-coded version
  tracing: Do not free iter->trace in fail path of tracing_open_pipe()
  tracing: Use strncpy instead of memcpy for string keys in hist triggers
  CIFS: Fix read after write for files with read caching
  CIFS: Do not skip SMB2 message IDs on send failures
  CIFS: Do not reset lease state to NONE on lease break
  crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routine
  crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling
  crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP
  crypto: x86/aesni-gcm - fix crash on empty plaintext
  crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP
  crypto: testmgr - skip crc32c context test for ahash algorithms
  crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
  crypto: pcbc - remove bogus memcpy()s with src == dest
  crypto: morus - fix handling chunked inputs
  crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
  crypto: arm64/crct10dif - revert to C code for short inputs
  crypto: arm64/aes-neonbs - fix returning final keystream block
  crypto: arm/crct10dif - revert to C code for short inputs
  crypto: aegis - fix handling chunked inputs
  crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails
  fix cgroup_do_mount() handling of failure exits
  libnvdimm: Fix altmap reservation size calculation
  libnvdimm/pmem: Honor force_raw for legacy pmem regions
  libnvdimm, pfn: Fix over-trim in trim_pfn_device()
  libnvdimm/label: Clear 'updating' flag after label-set update
  nfit/ars: Attempt short-ARS even in the no_init_ars case
  nfit/ars: Attempt a short-ARS whenever the ARS state is idle at boot
  acpi/nfit: Fix bus command validation
  nfit: acpi_nfit_ctl(): Check out_obj->type in the right place
  stm class: Prevent division by zero
  tmpfs: fix uninitialized return value in shmem_link
  selftests: fib_tests: sleep after changing carrier. again.
  net: set static variable an initial value in atl2_probe()
  bnxt_en: Wait longer for the firmware message response to complete.
  bnxt_en: Fix typo in firmware message timeout logic.
  nfp: bpf: fix ALU32 high bits clearance bug
  nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
  net: thunderx: add nicvf_send_msg_to_pf result check for set_rx_mode_task
  net: thunderx: make CFG_DONE message to run through generic send-ack sequence
  bpf, lpm: fix lookup bug in map_delete_elem
  mac80211_hwsim: propagate genlmsg_reply return code
  phonet: fix building with clang
  ARCv2: don't assume core 0x54 has dual issue
  ARCv2: support manual regfile save on interrupts
  ARC: uacces: remove lp_start, lp_end from clobber list
  ARCv2: lib: memcpy: fix doing prefetchw outside of buffer
  ixgbe: fix older devices that do not support IXGBE_MRQC_L3L4TXSWEN
  tmpfs: fix link accounting when a tmpfile is linked in
  mm: handle lru_add_drain_all for UP properly
  net: marvell: mvneta: fix DMA debug warning
  ARM: tegra: Restore DT ABI on Tegra124 Chromebooks
  arm64: Relax GIC version check during early boot
  ARM: dts: armada-xp: fix Armada XP boards NAND description
  qed: Fix iWARP syn packet mac address validation.
  qed: Fix iWARP buffer size provided for syn packet processing.
  ASoC: topology: free created components in tplg load error
  mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue
  xfrm: Fix inbound traffic via XFRM interfaces across network namespaces
  net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe()
  qmi_wwan: apply SET_DTR quirk to Sierra WP7607
  pinctrl: meson: meson8b: fix the sdxc_a data 1..3 pins
  net: dsa: bcm_sf2: Do not assume DSA master supports WoL
  net: systemport: Fix reception of BPDUs
  scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
  keys: Fix dependency loop between construction record and auth key
  assoc_array: Fix shortcut creation
  ARM: 8835/1: dma-mapping: Clear DMA ops on teardown
  af_key: unconditionally clone on broadcast
  bpf: fix lockdep false positive in stackmap
  bpf: only adjust gso_size on bytestream protocols
  ARM: 8824/1: fix a migrating irq bug when hotplug cpu
  esp: Skip TX bytes accounting when sending from a request socket
  clk: sunxi: A31: Fix wrong AHB gate number
  kallsyms: Handle too long symbols in kallsyms.c
  clk: sunxi-ng: v3s: Fix TCON reset de-assert bit
  Input: st-keyscan - fix potential zalloc NULL dereference
  auxdisplay: ht16k33: fix potential user-after-free on module unload
  i2c: bcm2835: Clear current buffer pointers and counts after a transfer
  i2c: cadence: Fix the hold bit setting
  net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
  mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  x86/CPU: Add Icelake model number
  net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
  scsi: qla2xxx: Fix panic from use after free in qla2x00_async_tm_cmd
  Revert "mm: use early_pfn_to_nid in page_ext_init"
  mm/gup: fix gup_pmd_range() for dax
  NFS: Don't use page_file_mapping after removing the page
  xprtrdma: Make sure Send CQ is allocated on an existing compvec
  floppy: check_events callback should not return a negative number
  ipvs: fix dependency on nf_defrag_ipv6
  blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue
  netfilter: compat: initialize all fields in xt_init
  mac80211: Fix Tx aggregation session tear down with ITXQs
  mac80211: call drv_ibss_join() on restart
  Input: matrix_keypad - use flush_delayed_work()
  Input: ps2-gpio - flush TX work when closing port
  Input: cap11xx - switch to using set_brightness_blocking()
  ARM: OMAP2+: fix lack of timer interrupts on CPU1 after hotplug
  ASoC: samsung: Prevent clk_get_rate() calls in atomic context
  KVM: arm64: Forbid kprobing of the VHE world-switch code
  KVM: arm/arm64: vgic: Always initialize the group of private IRQs
  arm/arm64: KVM: Don't panic on failure to properly reset system registers
  arm/arm64: KVM: Allow a VCPU to fully reset itself
  KVM: arm/arm64: Reset the VCPU without preemption and vcpu state loaded
  ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
  ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
  ARM: OMAP2+: Variable "reg" in function omap4_dsi_mux_pads() could be uninitialized
  ARM: dts: Configure clock parent for pwm vibra
  Input: pwm-vibra - stop regulator after disabling pwm, not before
  Input: pwm-vibra - prevent unbalanced regulator
  s390/dasd: fix using offset into zero size array error
  arm64: dts: rockchip: fix graph_port warning on rk3399 bob kevin and excavator
  KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlock
  clocksource: timer-ti-dm: Fix pwm dmtimer usage of fck reparenting
  ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
  gpu: ipu-v3: Fix CSI offsets for imx53
  drm/imx: imx-ldb: add missing of_node_puts
  gpu: ipu-v3: Fix i.MX51 CSI control registers offset
  drm/imx: ignore plane updates on disabled crtcs
  crypto: rockchip - update new iv to device in multiple operations
  crypto: rockchip - fix scatterlist nents error
  crypto: ahash - fix another early termination in hash walk
  crypto: cfb - remove bogus memcpy() with src == dest
  crypto: cfb - add missing 'chunksize' property
  crypto: ccree - don't copy zero size ciphertext
  crypto: ccree - unmap buffer before copying IV
  crypto: ccree - fix free of unallocated mlli buffer
  crypto: caam - fix DMA mapping of stack memory
  crypto: caam - fixed handling of sg list
  crypto: ccree - fix missing break in switch statement
  crypto: caam - fix hash context DMA unmap size
  stm class: Fix an endless loop in channel allocation
  mei: bus: move hw module get/put to probe/release
  mei: hbm: clean the feature flags on link reset
  iio: adc: exynos-adc: Fix NULL pointer exception on unbind
  ASoC: codecs: pcm186x: Fix energysense SLEEP bit
  ASoC: codecs: pcm186x: fix wrong usage of DECLARE_TLV_DB_SCALE()
  ASoC: fsl_esai: fix register setting issue in RIGHT_J mode
  9p/net: fix memory leak in p9_client_create
  9p: use inode->i_lock to protect i_size_write() under 32-bit
  media: videobuf2-v4l2: drop WARN_ON in vb2_warn_zero_bytesused()
  ANDROID: cuttlefish_defconfig: Enable CONFIG_INPUT_MOUSEDEV
  FROMLIST: psi: introduce psi monitor
  FROMLIST: refactor header includes to allow kthread.h inclusion in psi_types.h
  FROMLIST: psi: track changed states
  FROMLIST: psi: split update_stats into parts
  FROMLIST: psi: rename psi fields in preparation for psi trigger addition
  FROMLIST: psi: make psi_enable static
  FROMLIST: psi: introduce state_mask to represent stalled psi states
  ANDROID: cuttlefish_defconfig: Enable CONFIG_PSI
  UPSTREAM: kernel: cgroup: add poll file operation
  UPSTREAM: fs: kernfs: add poll file operation
  UPSTREAM: psi: avoid divide-by-zero crash inside virtual machines
  UPSTREAM: psi: clarify the Kconfig text for the default-disable option
  UPSTREAM: psi: fix aggregation idle shut-off
  UPSTREAM: psi: fix reference to kernel commandline enable
  UPSTREAM: psi: make disabling/enabling easier for vendor kernels
  UPSTREAM: kernel/sched/psi.c: simplify cgroup_move_task()
  UPSTREAM: psi: cgroup support
  UPSTREAM: psi: pressure stall information for CPU, memory, and IO
  UPSTREAM: sched: introduce this_rq_lock_irq()
  UPSTREAM: sched: sched.h: make rq locking and clock functions available in stats.h
  UPSTREAM: sched: loadavg: make calc_load_n() public
  BACKPORT: sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD
  UPSTREAM: delayacct: track delays from thrashing cache pages
  UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing

Conflicts:
	arch/arm/kernel/irq.c
	drivers/scsi/sd.c
	include/linux/sched.h
	init/Kconfig
	kernel/sched/Makefile
	kernel/sched/sched.h
	kernel/workqueue.c
	sound/soc/soc-dapm.c

Change-Id: Ia2dcc01c712134c57037ca6788d51172f66bcd93
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-04-05 01:06:22 -07:00
NeilBrown
7668d6e45f dm: fix to_sector() for 32bit
commit 0bdb50c531f7377a9da80d3ce2d61f389c84cb30 upstream.

A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host.  Using "unsigned long long"
is more correct.

Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.

Fixes: 0cf4503174 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+)
Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-23 20:10:09 +01:00
David Zeuthen
8d258cdc24 ANDROID: dm: do_mounts_dm: Update init/do_mounts_dm.c to the latest ChromiumOS version.
This is needed for AVB.

Bug: None
Test: Compiles.
Change-Id: I45b5d435652ab66ec07420ab17f2c7889f7e4d95
Signed-off-by: David Zeuthen <zeuthen@google.com>
2018-08-28 16:46:04 +05:30
Will Drewry
ee3ad542dd CHROMIUM: dm: boot time specification of dm=
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.

1. dm: allow a dm-fs-style device to be shared via dm-ioctl

Integrates feedback from Alisdair, Mike, and Kiyoshi.

Two main changes occur here:

- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table.  This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.

- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.

2. init: boot to device-mapper targets without an initr*

Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md.  It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise).  It also replaces /dev/XXX calls with major:minor opportunistically.

The format is dm="name uuid ro,table line 1,table line 2,...".  The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space.  Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).

A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.

An example dm-linear root with no uuid may look like:

root=/dev/dm-0  dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"

Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.

Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2
http://marc.info/?l=dm-devel&m=127429499422096&w=2
http://marc.info/?l=dm-devel&m=127429493922000&w=2

Latest upstream threads:
https://patchwork.kernel.org/patch/104859/
https://patchwork.kernel.org/patch/104860/
https://patchwork.kernel.org/patch/104861/

Bug: 27175947

Signed-off-by: Will Drewry <wad@chromium.org>

Review URL: http://codereview.chromium.org/2020011

Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31

[AmitP: Refactored the original changes based on upstream changes,
        commit e52347bd66 ("Documentation/admin-guide: split the kernel parameter list to a separate file")]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2018-08-28 16:46:04 +05:30
Dan Williams
b3a9a0c36e dax: Introduce a ->copy_to_iter dax operation
Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-05-22 23:18:31 -07:00
Linus Torvalds
83c7c18b16 Merge tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - DM core passthrough ioctl fix to retain reference to DM table, and
   that table's block devices, while issuing the ioctl to one of those
   block devices.

 - DM core passthrough ioctl fix to _not_ override the fmode_t used to
   issue the ioctl. Overriding by using the fmode_t that the block
   device was originally open with during DM table load is a liability.

 - Add DM core support for secure erase forwarding and update the DM
   linear and DM striped targets to support them.

 - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
   same, write zeroes) for targets that make use of the non-splitting IO
   variant (as is done for multipath or thinp when layered directly on
   NVMe).

 - Allow DM targets to return a payload in response to a DM message that
   they are sent. This is useful for DM targets that would like to
   provide statistics data in response to DM messages.

 - Update DM bufio to support non-power-of-2 block sizes. Numerous other
   related changes prepare the DM bufio code for this support.

 - Fix DM crypt to use a bounded amount of memory across the entire
   system. This is to avoid OOM that can otherwise occur in response to
   certain pathological IO workloads (e.g. discarding a large DM crypt
   device).

 - Add a 'check_at_most_once' feature to the DM verity target to allow
   verity to be used on mobile devices that have very limited resources.

 - Fix the DM integrity target to fail early if a keyed algorithm (e.g.
   HMAC) is to be used but the key isn't set.

 - Add non-power-of-2 support to the DM unstripe target.

 - Eliminate the use of a Variable Length Array in the DM stripe target.

 - Update the DM log-writes target to record metadata (REQ_META flag).

 - DM raid fixes for its nosync status and some variable range issues.

* tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
  dm: remove fmode_t argument from .prepare_ioctl hook
  dm: hold DM table for duration of ioctl rather than use blkdev_get
  dm raid: fix parse_raid_params() variable range issue
  dm verity: make verity_for_io_block static
  dm verity: add 'check_at_most_once' option to only validate hashes once
  dm bufio: don't embed a bio in the dm_buffer structure
  dm bufio: support non-power-of-two block sizes
  dm bufio: use slab cache for dm_buffer structure allocations
  dm bufio: reorder fields in dm_buffer structure
  dm bufio: relax alignment constraint on slab cache
  dm bufio: remove code that merges slab caches
  dm bufio: get rid of slab cache name allocations
  dm bufio: move dm-bufio.h to include/linux/
  dm bufio: delete outdated comment
  dm: add support for secure erase forwarding
  dm: backfill abnormal IO support to non-splitting IO submission
  dm raid: fix nosync status
  dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
  dm stripe: get rid of a Variable Length Array (VLA)
  dm log writes: record metadata flag for better flags record
  ...
2018-04-06 11:50:19 -07:00
Mike Snitzer
5bd5e8d891 dm: remove fmode_t argument from .prepare_ioctl hook
Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04 12:12:39 -04:00
Denis Semakin
00716545c8 dm: add support for secure erase forwarding
Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
data devices support secure erase.

Also, add support for secure erase to both the linear and striped
targets.

Signed-off-by: Denis Semakin <d.semakin@omprussia.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:21 -04:00
Mike Snitzer
1eb5fa849f dm: allow targets to return output from messages they are sent
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.

Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:10 -04:00
Bart Van Assche
233bde21aa block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h>
It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.

Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.

Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-17 14:45:23 -06:00
Mike Snitzer
ac514ffc96 dm mpath: delay the retry of a request if the target responded as busy
Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's
multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue.
This is beneficial to do if BLK_STS_RESOURCE is returned from the target
(because target is busy).

Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(),
indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay.

For old .request_fn: use blk_delay_queue().

bio-based multipath doesn't have feature parity with request-based for
retryable error requeues; that is something that'll need fixing in the
future.

Suggested-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
[as interpreted from Bart's "... patch looks fine to me."]
2018-01-29 13:44:54 -05:00
Brian Norris
f6e7baadd9 dm: move dm_table_destroy() to same header as dm_table_create()
If anyone is going to use dm_table_create(), they probably should be
able to use dm_table_destroy() too. Move the dm_table_destroy()
definition outside the private header, near dm_table_create()

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:16:06 -05:00
Mike Snitzer
22c11858e8 dm: introduce DM_TYPE_NVME_BIO_BASED
If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions.  Also,
the table has a single immutable target that doesn't require DM core to
split bios.

This will enable adding NVMe optimizations to bio-based DM.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-20 10:51:10 -05:00
Mike Snitzer
64f52b0e31 dm: improve performance by moving dm_io structure to per-bio-data
Eliminates need for a separate mempool to allocate 'struct dm_io'
objects from.  As such, it saves an extra mempool allocation for each
original bio that DM core is issued.

This complicates the per-bio-data accessor functions by needing to
conditonally add extra padding to get to a target's per-bio-data.  But
in the end this provides a decent performance improvement for all
bio-based DM devices.

On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
DM linear performance improved by 2% (went from 2665 to 2777 MB/s).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-16 20:43:13 -05:00
NeilBrown
f31c21e436 dm: remove unused 'num_write_bios' target interface
No DM target provides num_write_bios and none has since dm-cache's
brief use in 2013.

Having the possibility of num_write_bios > 1 complicates bio
allocation.  So remove the interface and assume there is only one bio
needed.

If a target ever needs more, it must provide a suitable bioset and
allocate itself based on its particular needs.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 12:15:58 -05:00
Mikulas Patocka
c3ca015fab dax: remove the pmem_dax_ops->flush abstraction
Commit abebfbe2f7 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-11 11:00:55 -04:00
Eric Biggers
5916a22b83 dm: constify argument arrays
The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.

(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 11:47:18 -04:00
Bart Van Assche
604407890e dm: fix printk() rate limiting code
Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.

Fixes: 71a16736a1 ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 09:58:27 -04:00
Linus Torvalds
b6ffe9ba46 Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "libnvdimm updates for the latest ACPI and UEFI specifications. This
  pull request also includes new 'struct dax_operations' enabling to
  undo the abuse of copy_user_nocache() for copy operations to pmem.

  The dax work originally missed 4.12 to address concerns raised by Al.

  Summary:

   - Introduce the _flushcache() family of memory copy helpers and use
     them for persistent memory write operations on x86. The
     _flushcache() semantic indicates that the cache is either bypassed
     for the copy operation (movnt) or any lines dirtied by the copy
     operation are written back (clwb, clflushopt, or clflush).

   - Extend dax_operations with ->copy_from_iter() and ->flush()
     operations. These operations and other infrastructure updates allow
     all persistent memory specific dax functionality to be pushed into
     libnvdimm and the pmem driver directly. It also allows dax-specific
     sysfs attributes to be linked to a host device, for example:
     /sys/block/pmem0/dax/write_cache

   - Add support for the new NVDIMM platform/firmware mechanisms
     introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
     namespace label format, extensions to the address-range-scrub
     command set, new error injection commands, and a new BTT
     (block-translation-table) layout. These updates support inter-OS
     and pre-OS compatibility.

   - Fix a longstanding memory corruption bug in nfit_test.

   - Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
     capable.

   - Miscellaneous fixes and small updates across libnvdimm and the nfit
     driver.

  Acknowledgements that came after the branch was pushed: commit
  6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
  sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
  <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
  libnvdimm, namespace: record 'lbasize' for pmem namespaces
  acpi/nfit: Issue Start ARS to retrieve existing records
  libnvdimm: New ACPI 6.2 DSM functions
  acpi, nfit: Show bus_dsm_mask in sysfs
  libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
  acpi, nfit: Enable DSM pass thru for root functions.
  libnvdimm: passthru functions clear to send
  libnvdimm, btt: convert some info messages to warn/err
  libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
  libnvdimm: fix the clear-error check in nsio_rw_bytes
  libnvdimm, btt: fix btt_rw_page not returning errors
  acpi, nfit: quiet invalid block-aperture-region warnings
  libnvdimm, btt: BTT updates for UEFI 2.7 format
  acpi, nfit: constify *_attribute_group
  libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
  libnvdimm, pmem, dax: export a cache control attribute
  dax: convert to bitmask for flags
  dax: remove default copy_from_iter fallback
  libnvdimm, nfit: enable support for volatile ranges
  libnvdimm, pmem: fix persistence warning
  ...
2017-07-07 09:44:06 -07:00
Damien Le Moal
10999307c1 dm: introduce dm_remap_zone_report()
A target driver support zoned block devices and exposing it as such may
receive REQ_OP_ZONE_REPORT request for the user to determine the mapped
device zone configuration. To process properly such request, the target
driver may need to remap the zone descriptors provided in the report
reply. The helper function dm_remap_zone_report() does this generically
using only the target start offset and length and the start offset
within the target device.

dm_remap_zone_report() will remap the start sector of all zones
reported. If the report includes sequential zones, the write pointer
position of these zones will also be remapped.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:51 -04:00
Damien Le Moal
dd88d313be dm table: add zoned block devices validation
1) Introduce DM_TARGET_ZONED_HM feature flag:

The target drivers currently available will not operate correctly if a
table target maps onto a host-managed zoned block device.

To avoid problems, introduce the new feature flag DM_TARGET_ZONED_HM to
allow a target to explicitly state that it supports host-managed zoned
block devices.  This feature is checked for all targets in a table if
any of the table's block devices are host-managed.

Note that as host-aware zoned block devices are backward compatible with
regular block devices, they can be used by any of the current target
types.  This new feature is thus restricted to host-managed zoned block
devices.

2) Check device area zone alignment:

If a target maps to a zoned block device, check that the device area is
aligned on zone boundaries to avoid problems with REQ_OP_ZONE_RESET
operations (resetting a partially mapped sequential zone would not be
possible).  This also facilitates the processing of zone report with
REQ_OP_ZONE_REPORT bios.

3) Check block devices zone model compatibility

When setting the DM device's queue limits, several possibilities exists
for zoned block devices:
1) The DM target driver may want to expose a different zone model
(e.g. host-managed device emulation or regular block device on top of
host-managed zoned block devices)
2) Expose the underlying zone model of the devices as-is

To allow both cases, the underlying block device zone model must be set
in the target limits in dm_set_device_limits() and the compatibility of
all devices checked similarly to the logical block size alignment.  For
this last check, introduce validate_hardware_zoned_model() to check that
all targets of a table have the same zone model and that the zone size
of the target devices are equal.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
[Mike Snitzer refactored Damien's original work to simplify the code]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:50 -04:00
Joe Perches
d2c3c8dcb5 dm: convert DM printk macros to pr_<level> macros
Using pr_<level> is the more common logging style.

Standardize style and use new macro DM_FMT.
Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:50 -04:00
Dan Williams
abebfbe2f7 dm: add ->flush() dax operation support
Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-15 14:34:59 -07:00
Dan Williams
7e026c8c0a dm: add ->copy_from_iter() dax operation support
Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-06-09 09:22:21 -07:00
Christoph Hellwig
4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig
2a842acab1 block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings.  This patch
instead introduces a new  blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning.  Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig
1be5690984 dm: change ->end_io calling convention
Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Linus Torvalds
53ef7d0e20 Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
 "The bulk of this has been in multiple -next releases. There were a few
  late breaking fixes and small features that got added in the last
  couple days, but the whole set has received a build success
  notification from the kbuild robot.

  Change summary:

   - Region media error reporting: A libnvdimm region device is the
     parent to one or more namespaces. To date, media errors have been
     reported via the "badblocks" attribute attached to pmem block
     devices for namespaces in "raw" or "memory" mode. Given that
     namespaces can be in "device-dax" or "btt-sector" mode this new
     interface reports media errors generically, i.e. independent of
     namespace modes or state.

     This subsequently allows userspace tooling to craft "ACPI 6.1
     Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
     requests and submit them via the ioctl path for NVDIMM root bus
     devices.

   - Introduce 'struct dax_device' and 'struct dax_operations': Prompted
     by a request from Linus and feedback from Christoph this allows for
     dax capable drivers to publish their own custom dax operations.
     This fixes the broken assumption that all dax operations are
     related to a persistent memory device, and makes it easier for
     other architectures and platforms to add customized persistent
     memory support.

   - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
     available for storage appliance applications to manually trigger
     memory controllers to drain write-pending buffers that would
     otherwise be flushed automatically by the platform ADR
     (asynchronous-DRAM-refresh) mechanism at a power loss event.
     Support for "locked" DIMMs is included to prevent namespaces from
     surfacing when the namespace label data area is locked. Finally,
     fixes for various reported deadlocks and crashes, also tagged for
     -stable.

   - ACPI / nfit driver updates: General updates of the nfit driver to
     add DSM command overrides, ACPI 6.1 health state flags support, DSM
     payload debug available by default, and various fixes.

  Acknowledgements that came after the branch was pushed:

   - commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
     Tested-by: Yi Zhang <yizhan@redhat.com>

   - commit 23f4984483 "libnvdimm: rework region badblocks clearing"
     Tested-by: Toshi Kani <toshi.kani@hpe.com>"

* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
  libnvdimm, pfn: fix 'npfns' vs section alignment
  libnvdimm: handle locked label storage areas
  libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
  brd: fix uninitialized use of brd->dax_dev
  block, dax: use correct format string in bdev_dax_supported
  device-dax: fix sysfs attribute deadlock
  libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
  libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
  libnvdimm: rework region badblocks clearing
  acpi, nfit: kill ACPI_NFIT_DEBUG
  libnvdimm: fix clear length of nvdimm_forget_poison()
  libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
  libnvdimm, region: sysfs trigger for nvdimm_flush()
  libnvdimm: fix phys_addr for nvdimm_clear_poison
  x86, dax, pmem: remove indirection around memcpy_from_pmem()
  block: remove block_device_operations ->direct_access()
  block, dax: convert bdev_dax_supported() to dax_direct_access()
  filesystem-dax: convert to dax_direct_access()
  Revert "block: use DAX for partition table reads"
  ext2, ext4, xfs: retrieve dax_device for iomap operations
  ...
2017-05-05 18:49:20 -07:00
Christoph Hellwig
412445acb6 dm: introduce a new DM_MAPIO_KILL return value
This untangles the DM_MAPIO_* values returned from ->clone_and_map_rq
from the error codes used by the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-01 18:19:03 -04:00
Christoph Hellwig
7ed8578a96 dm rq: change ->rq_end_io calling conventions
Instead of returning either a DM_ENDIO_* constant or an error code, add
a new DM_ENDIO_DONE value that means keep errno as is.  This allows us
to easily keep the existing error code in case where we can't push back,
and it also preparares for the new block level status codes with strict
type checking.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-01 18:19:03 -04:00
Mike Snitzer
7e25a76061 Merge branch 'dm-4.12' into dm-4.12-post-merge 2017-05-01 18:18:04 -04:00
Bart Van Assche
7e0d574f26 dm: introduce enum dm_queue_mode to cleanup related code
Introduce an enumeration type for the queue mode.  This patch does
not change any functionality but makes the DM code easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27 17:08:44 -04:00
Dan Williams
817bf40265 dm: teach dm-targets to use a dax_device + dax_operations
Arrange for dm to lookup the dax services available from member devices.
Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device. Changes the target-internal
->direct_access() method to more closely align with the dax_operations
->direct_access() calling convention.

Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:36 -07:00
Mikulas Patocka
e2460f2a4b dm: mark targets that pass integrity data
A dm-crypt on dm-integrity device incorrectly advertises an integrity
profile on the DM crypt device.  It can be seen in the files
"/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target
advertise the integrity profile.  That is incorrect, only the
dm-integrity target should advertise the integrity profile.

A general problem in DM is that if we have a DM device that depends on
another device with an integrity profile, the upper device will always
advertise the integrity profile, even when the target driver doesn't
support handling integrity data.

Most targets don't support integrity data, so we provide a whitelist of
targets that support it (linear, delay and striped).  The targets that
support passing integrity data to the lower device are marked with the
flag DM_TARGET_PASSES_INTEGRITY.  The DM core will now advertise
integrity data on a DM device only if all the targets support the
integrity data.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24 12:04:32 -04:00
Dan Williams
f26c5719b2 dm: add dax_device and dax_operations support
Allocate a dax_device to represent the capacity of a device-mapper
instance. Provide a ->direct_access() method via the new dax_operations
indirection that mirrors the functionality of the current direct_access
support via block_device_operations.  Once fs/dax.c has been converted
to use dax_operations the old dm_blk_direct_access() will be removed.

A new helper dm_dax_get_live_target() is introduced to separate some of
the dm-specifics from the direct_access implementation.

This enabling is only for the top-level dm representation to upper
layers. Converting target direct_access implementations is deferred to a
separate patch.

Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-20 11:57:52 -07:00
Christoph Hellwig
48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig
ac62d6208a dm: support REQ_OP_WRITE_ZEROES
Copy & paste from the REQ_OP_WRITE_SAME code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Milan Broz
9b4b5a797c dm table: add flag to allow target to handle its own integrity metadata
Add DM_TARGET_INTEGRITY flag that specifies bio integrity metadata is
not inherited but implemented in the target itself.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-07 13:28:32 -05:00
Christoph Hellwig
eb8db831be dm: always defer request allocation to the owner of the request_queue
DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device.  But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers.   Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27 15:08:35 -07:00
Mike Snitzer
a8ac51e4ab dm rq: add DM_MAPIO_DELAY_REQUEUE to delay requeue of blk-mq requests
Otherwise blk-mq will immediately dispatch requests that are requeued
via a BLK_MQ_RQ_QUEUE_BUSY return from blk_mq_ops .queue_rq.

Delayed requeue is implemented using blk_mq_delay_kick_requeue_list()
with a delay of 5 secs.  In the context of DM multipath (all paths down)
it doesn't make any sense to requeue more quickly.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-14 13:56:38 -04:00
Linus Torvalds
f0c98ebc57 Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
2016-07-28 17:38:16 -07:00
Toshi Kani
545ed20e6d dm: add infrastructure for DAX support
Change mapped device to implement direct_access function,
dm_blk_direct_access(), which calls a target direct_access function.
'struct target_type' is extended to have target direct_access interface.
This function limits direct accessible size to the dm_target's limit
with max_io_len().

Add dm_table_supports_dax() to iterate all targets and associated block
devices to check for DAX support.  To add DAX support to a DM target the
target must only implement the direct_access function.

Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
device supports DAX and is bio based.  This new type is used to assure
that all target devices have DAX support and remain that way after
QUEUE_FLAG_DAX is set in mapped device.

At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
mapped device must have the same type, or else it fails per the check in
table_load().

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20 23:49:49 -04:00
Mike Snitzer
e83068a5fa dm mpath: add optional "queue_mode" feature
Allow a user to specify an optional feature 'queue_mode <mode>' where
<mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
request_fn rq-based, and blk-mq rq-based respectively.

If the queue_mode feature isn't specified the default for the
"multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
it'll default to mode "mq".

This new queue_mode feature introduces the ability for each multipath
device to have its own queue_mode (whereas before this feature all
multipath devices effectively had to have the same queue_mode).

This commit also goes a long way to eliminate the awkward (ab)use of
DM_TYPE_*, the associated filter_md_type() and other relatively fragile
and difficult to maintain code.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-06-10 15:16:02 -04:00
DingXiang
4df2bf466a dm snapshot: disallow the COW and origin devices from being identical
Otherwise loading a "snapshot" table using the same device for the
origin and COW devices, e.g.:

echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap

will trigger:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1958.989655] PGD 0
[ 1958.991903] Oops: 0000 [#1] SMP
...
[ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G          IO    4.5.0-rc5.snitm+ #150
...
[ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000
[ 1959.091865] RIP: 0010:[<ffffffffa040efba>]  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.104295] RSP: 0018:ffff88032a957b30  EFLAGS: 00010246
[ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00
[ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001
[ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80
[ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80
[ 1959.150021] FS:  00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000
[ 1959.159047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0
[ 1959.173415] Stack:
[ 1959.175656]  ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc
[ 1959.183946]  ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000
[ 1959.192233]  ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf
[ 1959.200521] Call Trace:
[ 1959.203248]  [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot]
[ 1959.211986]  [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot]
[ 1959.219469]  [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod]
[ 1959.226656]  [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod]
[ 1959.234328]  [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod]
[ 1959.241129]  [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
[ 1959.248607]  [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod]
[ 1959.255307]  [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20
[ 1959.261816]  [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 1959.268615]  [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0
[ 1959.274637]  [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100
[ 1959.281726]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 1959.288814]  [<ffffffff81216449>] SyS_ioctl+0x79/0x90
[ 1959.294450]  [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71
...
[ 1959.323277] RIP  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.333090]  RSP <ffff88032a957b30>
[ 1959.336978] CR2: 0000000000000098
[ 1959.344121] ---[ end trace b049991ccad1169e ]---

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899
Cc: stable@vger.kernel.org
Signed-off-by: Ding Xiang <dingxiang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-10 17:12:09 -05:00
Mike Snitzer
30187e1d48 dm: rename target's per_bio_data_size to per_io_data_size
Request-based DM will also make use of per_bio_data_size.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22 22:34:37 -05:00
Mike Snitzer
f083b09b78 dm: set DM_TARGET_WILDCARD feature on "error" target
The DM_TARGET_WILDCARD feature indicates that the "error" target may
replace any target; even immutable targets.  This feature will be useful
to preserve the ability to replace the "multipath" target even once it
is formally converted over to having the DM_TARGET_IMMUTABLE feature.

Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
.map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
in the target_type.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22 11:06:21 -05:00
Christoph Hellwig
e56f81e0b0 dm: refactor ioctl handling
This moves the call to blkdev_ioctl and the argument checking to DM core
code, and only leaves a callout to find the block device to operate on
in the targets.  This simplifies the code and allows us to pass through
ioctl-like command using other methods in the next patch.

Also split out a helper around calling the prepare_ioctl method that
will be reused for persistent reservation handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-10-31 19:05:59 -04:00