* refs/heads/tmp-10f1d14:
Linux 4.19.86
x86/resctrl: Fix rdt_find_domain() return value and checks
mmc: tmio: fix SCC error handling to avoid false positive CRC error
powerpc/time: Fix clockevent_decrementer initalisation for PR KVM
tools: PCI: Fix broken pcitest compilation
PM / devfreq: Fix static checker warning in try_then_request_governor
ACPI / LPSS: Use acpi_lpss_* instead of acpi_subsys_* functions for hibernate
tcp: start receiver buffer autotuning sooner
ARM: dts: omap5: Fix dual-role mode on Super-Speed port
mlxsw: spectrum_switchdev: Check notification relevance based on upper device
spi: rockchip: initialize dma_slave_config properly
mac80211: minstrel: fix sampling/reporting of CCK rates in HT mode
mac80211: minstrel: fix CCK rate group streams value
mac80211: minstrel: fix using short preamble CCK rates on HT clients
misc: cxl: Fix possible null pointer dereference
netfilter: nft_compat: do not dump private area
net: sched: avoid writing on noop_qdisc
selftests: forwarding: Have lldpad_app_wait_set() wait for unknown, too
hwmon: (npcm-750-pwm-fan) Change initial pwm target to 255
hwmon: (ina3221) Fix INA3221_CONFIG_MODE macros
hwmon: (pwm-fan) Silence error on probe deferral
hwmon: (nct6775) Fix names of DIMM temperature sources
hwmon: (k10temp) Support all Family 15h Model 6xh and Model 7xh processors
scsi: arcmsr: clean up clang warning on extraneous parentheses
pinctrl: gemini: Fix up TVC clock group
orangefs: rate limit the client not running info message
x86/mm: Do not warn about PCI BIOS W+X mappings
ARM: 8802/1: Call syscall_trace_exit even when system call skipped
spi: spidev: Fix OF tree warning logic
pinctrl: gemini: Mask and set properly
spi: fsl-lpspi: Prevent FIFO under/overrun by default
gpio: syscon: Fix possible NULL ptr usage
net: fix generic XDP to handle if eth header was mangled
bpf: btf: Fix a missing check bug
x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
lightnvm: pblk: consider max hw sectors supported for max_write_pgs
lightnvm: pblk: fix error handling of pblk_lines_init()
lightnvm: do no update csecs and sos on 1.2
lightnvm: pblk: guarantee mw_cunits on read buffer
lightnvm: pblk: fix write amplificiation calculation
lightnvm: pblk: guarantee emeta on line close
lightnvm: pblk: fix incorrect min_write_pgs
lightnvm: pblk: fix rqd.error return value in pblk_blk_erase_sync
ALSA: hda/ca0132 - Fix input effect controls for desktop cards
media: venus: vdec: fix decoded data size
media: cx231xx: fix potential sign-extension overflow on large shift
GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
media: isif: fix a NULL pointer dereference bug
printk: Give error on attempt to set log buffer length to over 2G
mfd: ti_am335x_tscadc: Keep ADC interface on if child is wakeup capable
backlight: lm3639: Unconditionally call led_classdev_unregister
proc/vmcore: Fix i386 build error of missing copy_oldmem_page_encrypted()
s390/kasan: avoid user access code instrumentation
s390/kasan: avoid instrumentation of early C code
s390/kasan: avoid vdso instrumentation
mmc: mmci: expand startbiterr to irqmask and error check
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
mtd: devices: m25p80: Make sure WRITE_EN is issued before each write
mtd: spi-nor: cadence-quadspi: Use proper enum for dma_[un]map_single
media: cx18: Don't check for address of video_dev
media: dw9807-vcm: Fix probe error handling
media: dw9714: Fix error handling in probe function
platform/x86: mlx-platform: Properly use mlxplat_mlxcpld_msn201x_items
bcache: recal cached_dev_sectors on detach
bcache: account size of buckets used in uuid write to ca->meta_sectors_written
reset: Fix potential use-after-free in __of_reset_control_get()
fbdev: fix broken menu dependencies
fbdev: sbuslib: integer overflow in sbusfb_ioctl_helper()
fbdev: sbuslib: use checked version of put_user()
atmel_lcdfb: support native-mode display-timings
mmc: renesas_sdhi_internal_dmac: set scatter/gather max segment size
mmc: tmio: Fix SCC error detection
mmc: renesas_sdhi_internal_dmac: Whitelist r8a774a1
x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately
xsk: proper AF_XDP socket teardown ordering
iwlwifi: mvm: don't send keys when entering D3
ACPI / SBS: Fix rare oops when removing modules
xfrm: use correct size to initialise sp->ovec
crypto: mxs-dcp - Fix AES issues
crypto: mxs-dcp - Fix SHA null hashes and output length
dmaengine: rcar-dmac: set scatter/gather max segment size
x86/olpc: Fix build error with CONFIG_MFD_CS5535=m
kexec: Allocate decrypted control pages for kdump if SME is enabled
remoteproc: qcom: q6v5: Fix a race condition on fatal crash
remoteproc: Check for NULL firmwares in sysfs interface
tc-testing: fix build of eBPF programs
net: hns3: Fix for rx vlan id handle to support Rev 0x21 hardware
soc: fsl: bman_portals: defer probe after bman's probe
Input: silead - try firmware reload after unsuccessful resume
Input: st1232 - set INPUT_PROP_DIRECT property
i2c: zx2967: use core to detect 'no zero length' quirk
i2c: tegra: use core to detect 'no zero length' quirk
i2c: qup: use core to detect 'no zero length' quirk
i2c: omap: use core to detect 'no zero length' quirk
gfs2: slow the deluge of io error messages
media: cec-gpio: select correct Signal Free Time
media: ov5640: fix framerate update
dmaengine: ioat: fix prototype of ioat_enumerate_channels
NFSv4.x: fix lock recovery during delegation recall
printk: Correct wrong casting
i2c: brcmstb: Allow enabling the driver on DSL SoCs
clk: samsung: Use clk_hw API for calling clk framework from clk notifiers
clk: samsung: exynos5420: Define CLK_SECKEY gate clock only or Exynos5420
clk: samsung: Use NOIRQ stage for Exynos5433 clocks suspend/resume
qtnfmac: drop error reports for out-of-bounds key indexes
qtnfmac: inform wireless core about supported extended capabilities
qtnfmac: pass sgi rate info flag to wireless core
qtnfmac: request userspace to do OBSS scanning if FW can not
brcmfmac: fix full timeout waiting for action frame on-channel tx
brcmfmac: reduce timeout for action frame scan
cpu/SMT: State SMT is disabled even with nosmt and without "=force"
mtd: physmap_of: Release resources on error
usb: dwc2: disable power_down on rockchip devices
USB: serial: cypress_m8: fix interrupt-out transfer length
KVM: PPC: Book3S PR: Exiting split hack mode needs to fixup both PC and LR
bnxt_en: return proper error when FW returns HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED
ALSA: hda/sigmatel - Disable automute for Elo VuPoint
media: i2c: adv748x: Support probing a single output
media: rcar-vin: fix redeclaration of symbol
media: pxa_camera: Fix check for pdev->dev.of_node
media: rc: ir-rc6-decoder: enable toggle bit for Kathrein RCU-676 remote
qed: Avoid implicit enum conversion in qed_ooo_submit_tx_buffers
ata: ep93xx: Use proper enums for directions
powerpc/64s/radix: Explicitly flush ERAT with local LPID invalidation
powerpc/time: Use clockevents_register_device(), fixing an issue with large decrementer
ASoC: qdsp6: q6asm-dai: checking NULL vs IS_ERR()
cpuidle: menu: Fix wakeup statistics updates for polling state
ACPICA: Never run _REG on system_memory and system_IO
OPP: Return error on error from dev_pm_opp_get_opp_count()
msm/gpu/a6xx: Force of_dma_configure to setup DMA for GMU
rpmsg: glink: smem: Support rx peak for size less than 4 bytes
IB/mlx4: Avoid implicit enumerated type conversion
RDMA/hns: Limit the size of extend sge of sq
RDMA/hns: Bugfix for CM test
RDMA/hns: Submit bad wr when post send wr exception
RDMA/hns: Bugfix for reserved qp number
IB/rxe: avoid srq memory leak
IB/mthca: Fix error return code in __mthca_init_one()
ixgbe: Fix crash with VFs and flow director on interface flap
i40e: Use proper enum in i40e_ndo_set_vf_link_state
ixgbe: Fix ixgbe TX hangs with XDP_TX beyond queue limit
md: allow metadata updates while suspending an array - fix
ice: Fix forward to queue group logic
clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
clocksource/drivers/sh_cmt: Fixup for 64-bit machines
tools: PCI: Fix compilation warnings
PM / hibernate: Check the success of generating md5 digest before hibernation
mtd: rawnand: sh_flctl: Use proper enum for flctl_dma_fifo0_transfer
ARM: dts: at91: sama5d2_ptc_ek: fix bootloader env offsets
ARM: dts: at91: at91sam9x5cm: fix addressable nand flash size
ARM: dts: at91: sama5d4_xplained: fix addressable nand flash size
powerpc/xive: Move a dereference below a NULL test
powerpc/pseries: Fix how we iterate over the DTL entries
powerpc/pseries: Fix DTL buffer registration
cxgb4: Use proper enum in IEEE_FAUX_SYNC
cxgb4: Use proper enum in cxgb4_dcb_handle_fw_update
mei: samples: fix a signedness bug in amt_host_if_call()
x86/PCI: Apply VMD's AERSID fixup generically
sunrpc: Fix connect metrics
clk: keystone: Enable TISCI clocks if K3_ARCH
ext4: fix build error when DX_DEBUG is defined
ALSA: hda: Fix mismatch for register mask and value in ext controller.
dmaengine: timb_dma: Use proper enum in td_prep_slave_sg
dmaengine: ep93xx: Return proper enum in ep93xx_dma_chan_direction
printk: CON_PRINTBUFFER console registration is a bit racy
printk: Do not miss new messages when replaying the log
KVM: PPC: Inform the userspace about TCE update failures
watchdog: w83627hf_wdt: Support NCT6796D, NCT6797D, NCT6798D
watchdog: sama5d4: fix timeout-sec usage
watchdog: renesas_wdt: stop when unregistering
watchdog: core: fix null pointer dereference when releasing cdev
irqchip/irq-mvebu-icu: Fix wrong private data retrieval
nl80211: Fix a GET_KEY reply attribute
usb: dwc3: gadget: Check ENBLSLPM before sending ep command
usb: gadget: udc: fotg210-udc: Fix a sleep-in-atomic-context bug in fotg210_get_status()
selftests/tls: Fix recv(MSG_PEEK) & splice() test cases
ath9k: fix reporting calculated new FFT upper max
PM / devfreq: stopping the governor before device_unregister()
PM / devfreq: Fix handling of min/max_freq == 0
PM / devfreq: Fix devfreq_add_device() when drivers are built as modules.
ata: ahci_brcm: Allow using driver or DSL SoCs
rtlwifi: btcoex: Use proper enumerated types for Wi-Fi only interface
ath10k: fix vdev-start timeout on error
arm64/numa: Report correct memblock range for the dummy node
kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout
iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
mt76: fix handling ps-poll frames
mt76x2: disable WLAN core before probe
mt76x2: fix tx power configuration for VHT mcs 9
IB/hfi1: Ensure ucast_dlid access doesnt exceed bounds
IB/hfi1: Error path MAD response size is incorrect
f2fs: keep lazytime on remount
ACPI / LPSS: Resume BYT/CHT I2C controllers from resume_noirq
ACPI / LPSS: Make acpi_lpss_find_device() also find PCI devices
SUNRPC: Fix priority queue fairness
tcp: up initial rmem to 128KB and SYN rwin to around 64KB
ARM: dts: sun8i: h3: bpi-m2-plus: Fix address for external RGMII Ethernet PHY
ARM: dts: sun8i: h3-h5: ir register size should be the whole memory block
f2fs: return correct errno in f2fs_gc
net: hns3: Fix loss of coal configuration while doing reset
net: hns3: Fix for netdev not up problem when setting mtu
ARM: dts: omap5: enable OTG role for DWC3 controller
ARM: dts: dra7: Enable workaround for errata i870 in PCIe host mode
net: xen-netback: fix return type of ndo_start_xmit function
net: ovs: fix return type of ndo_start_xmit function
bpf, x32: Fix bug for BPF_JMP | {BPF_JSGT, BPF_JSLE, BPF_JSLT, BPF_JSGE}
bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0
bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0
bpf, x32: Fix bug for BPF_ALU64 | BPF_NEG
fbdev: Ditch fb_edid_add_monspecs
arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault
mm/memory_hotplug: fix updating the node span
mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()
idr: Fix idr_get_next race with idr_remove
net: cdc_ncm: Signedness bug in cdc_ncm_set_dgram_size()
Revert "OPP: Protect dev_list with opp_table lock"
tee: optee: add missing of_node_put after of_device_is_available
i2c: mediatek: modify threshold passed to i2c_get_dma_safe_msg_buf()
spi: mediatek: use correct mata->xfer_len when in fifo transfer
Conflicts:
drivers/rpmsg/qcom_glink_smem.c
drivers/usb/dwc3/gadget.c
Change-Id: I6e0f156d860bf2afcaabcf70d653676eb7d3de4e
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
[ Upstream commit f42f7c283078ce3c1e8368b140e270755b1ae313 ]
Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
* refs/heads/tmp-5b2dde5:
Revert "Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup""
Revert "usb: dwc3: gadget: combine unaligned and zero flags"
Revert "usb: dwc3: gadget: track number of TRBs per request"
Revert "usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()"
Revert "usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()"
Revert "usb: dwc3: gadget: introduce cancelled_list"
Revert "usb: dwc3: gadget: move requests to cancelled_list"
Revert "usb: dwc3: gadget: remove wait_end_transfer"
Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"
Revert "usb: dwc3: Reset num_trbs after skipping"
Linux 4.19.57
arm64: insn: Fix ldadd instruction encoding
usb: dwc3: Reset num_trbs after skipping
tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
RDMA: Directly cast the sockaddr union to sockaddr
futex: Update comments and docs about return values of arch futex code
bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
bpf: fix unconnected udp hooks
bpf: fix nested bpf tracepoints with per-cpu data
bpf: lpm_trie: check left child of last leftmost node for NULL
bpf: simplify definition of BPF_FIB_LOOKUP related flags
tun: wake up waitqueues after IFF_UP is set
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
tipc: change to use register_pernet_device
team: Always enable vlan tx offload
sctp: change to hold sk after auth shkey is created successfully
net: stmmac: set IC bit when transmitting frames with HW timestamp
net: stmmac: fixed new system time seconds value calculation
net: remove duplicate fetch in sock_getsockopt
net/packet: fix memory leak in packet_set_ring()
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
bonding: Always enable vlan tx offload
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
eeprom: at24: fix unexpected timeout under high load
irqchip/mips-gic: Use the correct local interrupt map registers
SUNRPC: Clean up initialisation of the struct rpc_rqst
cpu/speculation: Warn on unsupported mitigations= parameter
NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT
x86/resctrl: Prevent possible overrun during bitmap operations
x86/microcode: Fix the microcode load on CPU hotplug for real
x86/speculation: Allow guests to use SSBD even if host does not
scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
dm log writes: make sure super sector log updates are written in order
mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
clk: socfpga: stratix10: fix divider entry for the emac clocks
fs/binfmt_flat.c: make load_flat_shared_library() work
mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
fs/proc/array.c: allow reporting eip/esp for all coredumping threads
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
usb: dwc3: gadget: remove wait_end_transfer
usb: dwc3: gadget: move requests to cancelled_list
usb: dwc3: gadget: introduce cancelled_list
usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
usb: dwc3: gadget: track number of TRBs per request
usb: dwc3: gadget: combine unaligned and zero flags
Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"
qmi_wwan: Fix out-of-bounds read
net/9p: include trans_common.h to fix missing prototype warning.
9p/trans_fd: put worker reqs on destroy
9p/trans_fd: abort p9_read_work if req status changed
9p: potential NULL dereference
9p: p9dirent_read: check network-provided name length
9p/rdma: remove useless check in cm_event_handler
9p: acl: fix uninitialized iattr access
9p: Rename req to rreq in trans_fd
9p/rdma: do not disconnect on down_interruptible EAGAIN
9p: Add refcount to p9_req_t
9p: rename p9_free_req() function
9p: add a per-client fcall kmem_cache
9p: embed fcall in req to round down buffer allocs
9p: Use a slab for allocating requests
9p/xen: fix check for xenbus_read error in front_probe
IB/hfi1: Close PSM sdma_progress sleep window
Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
perf header: Fix unchecked usage of strncpy()
perf help: Remove needless use of strncpy()
perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
ANDROID: drivers/misc: disable LTO for rodata.o
Conflicts:
arch/arm64/Makefile
Change-Id: Id088f01491976ee9860dba83d2f3bff5ab35189d
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
* refs/heads/tmp-0755dc9:
Linux 4.19.22
svcrdma: Remove max_sge check at connect time
svcrdma: Reduce max_send_sges
batman-adv: Force mac header to start of data on xmit
batman-adv: Avoid WARN on net_device without parent in netns
xfrm: refine validation of template and selector families
libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()
Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
xfrm: Make set-mark default behavior backward compatible
SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT
drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
drm/vmwgfx: Fix setting of dma masks
drm/i915: always return something on DDI clock selection
drm/amd/powerplay: Fix missing break in switch
drm/modes: Prevent division by zero htotal
mac80211: ensure that mgmt tx skbs have tailroom for encryption
mic: vop: Fix use-after-free on remove
powerpc/radix: Fix kernel crash with mremap()
firmware: arm_scmi: provide the mandatory device release callback
ARM: dts: da850: fix interrupt numbers for clocksource
ARM: tango: Improve ARCH_MULTIPLATFORM compatibility
ARM: iop32x/n2100: fix PCI IRQ mapping
MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds
mips: loongson64: remove unreachable(), fix loongson_poweroff().
MIPS: VDSO: Use same -m%-float cflag as the kernel proper
MIPS: OCTEON: don't set octeon_dma_bar_type if PCI is disabled
mips: cm: reprime error cause
tracing: uprobes: Fix typo in pr_fmt string
pinctrl: cherryview: fix Strago DMI workaround
pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
debugfs: fix debugfs_rename parameter checking
samples: mei: use /dev/mei0 instead of /dev/mei
mei: me: add ice lake point device id.
misc: vexpress: Off by one in vexpress_syscfg_exec()
signal: Better detection of synchronous signals
signal: Always notice exiting tasks
iio: ti-ads8688: Update buffer allocation for timestamps
iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius
iio: adc: axp288: Fix TS-pin handling
tools: iio: iio_generic_buffer: make num_loops signed
libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD
mtd: rawnand: gpmi: fix MX28 bus master lockup problem
mtd: spinand: Fix the error/cleanup path in spinand_init()
mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache
mtd: Make sure mtd->erasesize is valid even if the partition is of size 0
ANDROID: cuttlefish: enable CONFIG_NET_SCH_NETEM=y
Add XFRM-I to cuttlefish defconfigs
ANDROID: Move from clang r346389b to r349610.
Change-Id: Ie249267aa9e0d4eb169adecafc0cdc59a0a2eb0f
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
* refs/heads/tmp-976f78d:
Linux 4.19.16
Btrfs: use nofs context when initializing security xattrs to avoid deadlock
Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation
Btrfs: fix access to available allocation bits when starting balance
arm64: compat: Don't pull syscall number from regs in arm_compat_syscall
KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less
sunrpc: use-after-free in svc_process_common()
mm: page_mapped: don't assume compound page is huge or THP
ext4: fix special inode number checks in __ext4_iget()
ext4: track writeback errors using the generic tracking infrastructure
ext4: use ext4_write_inode() when fsyncing w/o a journal
ext4: avoid kernel warning when writing the superblock to a dead device
ext4: fix a potential fiemap/page fault deadlock w/ inline_data
ext4: make sure enough credits are reserved for dioread_nolock writes
rbd: don't return 0 on unmap if RBD_DEV_FLAG_REMOVING is set
drm/amdgpu: Don't fail resume process if resuming atomic state fails
drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
drm/i915: Unwind failure on pinning the gen7 ppgtt
drm/fb-helper: Partially bring back workaround for bugs of SDL 1.2
drm/fb_helper: Allow leaking fbdev smem_start
drm/amd/display: Fix MST dp_blank REG_WAIT timeout
PCI: dwc: Move interrupt acking into the proper callback
PCI: dwc: Take lock when ACKing an interrupt
PCI: dwc: Use interrupt masking instead of disabling
drm/amdgpu: Add new VegaM pci id
vfio/type1: Fix unmap overflow off-by-one
mtd: rawnand: qcom: fix memory corruption that causes panic
i2c: dev: prevent adapter retries and timeout being set as minus value
ACPI/IORT: Fix rc_dma_get_range()
ACPI / PMIC: xpower: Fix TS-pin current-source handling
ACPI: power: Skip duplicate power resource references in _PRx
mm, memcg: fix reclaim deadlock with writeback
mm/usercopy.c: no check page span for stack objects
slab: alien caches must not be initialized if the allocation of the alien cache failed
USB: Add USB_QUIRK_DELAY_CTRL_MSG quirk for Corsair K70 RGB
USB: storage: add quirk for SMI SM3350
USB: storage: don't insert sane sense for SPC3+ when bad sense specified
usb: cdc-acm: send ZLP for Telit 3G Intel based modems
cifs: Fix potential OOB access of lock element array
CIFS: Fix credit computation for compounded requests
CIFS: Do not hide EINTR after sending network packets
CIFS: Do not set credits to 1 if the server didn't grant anything
CIFS: Fix adjustment of credits for MTU requests
ALSA: hda/realtek - Disable headset Mic VREF for headset mode of ALC225
ALSA: hda/realtek - Add unplug function into unplug state of Headset Mode for ALC225
ALSA: hda/realtek - Support Dell headset mode for New AIO platform
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
cpufreq: scmi: Fix frequency invariance in slow path
staging: rtl8188eu: Fix module loading from tasklet for WEP encryption
staging: rtl8188eu: Fix module loading from tasklet for CCMP encryption
Btrfs: fix deadlock when using free space tree due to block group creation
UPSTREAM: selftests/memfd: Add tests for F_SEAL_FUTURE_WRITE seal
UPSTREAM: mm/memfd: Add an F_SEAL_FUTURE_WRITE seal to memfd
Revert "UPSTREAM: mm: Add an F_SEAL_FUTURE_WRITE seal to memfd"
Revert "UPSTREAM: mm/memfd: make F_SEAL_FUTURE_WRITE seal more robust"
ANDROID: cuttlefish: enable CONFIG_NET_CLS_BPF=y
Makefile: Fix 4.19.15 resolution
ANDROID: f2fs: Complement "android_fs" tracepoint of read path
Change-Id: I9c9c1f53796798b4ac1038dcfcf0d70624c1cfca
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
This patch is only appropriate for stable kernels v4.16 - v4.19
Since commit 9b30889c54 ("SUNRPC: Ensure we always close the socket after
a connection shuts down"), and until commit c544577daddb ("SUNRPC: Clean up
transport write space handling"), it is possible for the NFS client to spin
in the following tight loop:
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_bind [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_connect [sunrpc]
269.964083: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=0 action=call_transmit [sunrpc]
269.964085: xprt_transmit: peer=[10.0.1.82]:2049 xid=0x761d3f77 status=-32
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_transmit_status [sunrpc]
269.964085: rpc_task_run_action: task:43@0 flags=5a81 state=0005 status=-32 action=call_status [sunrpc]
269.964085: rpc_call_status: task:43@0 status=-32
The issue is that the path through call_transmit_status does not release
the XPRT_LOCK when the transmit result is -EPIPE, so the socket cannot be
properly shut down.
The below commit fixed things up in mainline by unconditionally calling
xprt_end_transmit() and releasing the XPRT_LOCK after every pass through
call_transmit. However, the entirety of this commit is not appropriate for
stable kernels because its original inclusion was part of a series that
modifies the sunrpc code to use a different queueing model. As a result,
there are machinations within this patch that are not needed for a stable
fix and will not make sense without a larger backport of the mainline
series.
In this patch, we take the slightly modified bit of the mainline patch
below, which is to release the XPRT_LOCK on transmission error should we
detect that the transport is waiting to close.
commit c544577daddb618c7dd5fa7fb98d6a41782f020e upstream
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Mon Sep 3 23:39:27 2018 -0400
SUNRPC: Clean up transport write space handling
Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The original discussion of the problem is here:
https://lore.kernel.org/linux-nfs/20181212135157.4489-1-dwysocha@redhat.com/T/#t
This passes my usual cthon and xfstests on NFS as applied on v4.19 mainline.
Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.
if node have NFSv41+ mounts inside several net namespaces
it can lead to use-after-free in svc_process_common()
svc_process_common()
/* Setup reply header */
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
svc_process_common() can use incorrect rqstp->rq_xprt,
its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
The problem is that serv is global structure but sv_bc_xprt
is assigned per-netnamespace.
According to Trond, the whole "let's set up rqstp->rq_xprt
for the back channel" is nothing but a giant hack in order
to work around the fact that svc_process_common() uses it
to find the xpt_ops, and perform a couple of (meaningless
for the back channel) tests of xpt_flags.
All we really need in svc_process_common() is to be able to run
rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
Bruce J Fields points that this xpo_prep_reply_hdr() call
is an awfully roundabout way just to do "svc_putnl(resv, 0);"
in the tcp case.
This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
now it calls svc_process_common() with rqstp->rq_xprt = NULL.
To adjust reply header svc_process_common() just check
rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
To handle rqstp->rq_xprt = NULL case in functions called from
svc_process_common() patch intruduces net namespace pointer
svc_rqst->rq_bc_net and adjust SVC_NET() definition.
Some other function was also adopted to properly handle described case.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: stable@vger.kernel.org
Fixes: 23c20ecd44 ("NFS: callback up - users counting cleanup")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
v2: added lost extern svc_tcp_prep_reply_hdr()
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull NFS client updates from Anna Schumaker:
"These patches include adding async support for the v4.2 COPY
operation. I think Bruce is planning to send the server patches for
the next release, but I figured we could get the client side out of
the way now since it's been in my tree for a while. This shouldn't
cause any problems, since the server will still respond with
synchronous copies even if the client requests async.
Features:
- Add support for asynchronous server-side COPY operations
Stable bufixes:
- Fix an off-by-one in bl_map_stripe() (v3.17+)
- NFSv4 client live hangs after live data migration recovery (v4.9+)
- xprtrdma: Fix disconnect regression (v4.18+)
- Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
- Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)
Other bugfixes and cleanups:
- Optimizations and fixes involving NFS v4.1 / pNFS layout handling
- Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
- Immediately reschedule writeback when the server replies with an
error
- Fix excessive attribute revalidation in nfs_execute_ok()
- Add error checking to nfs_idmap_prepare_message()
- Use new vm_fault_t return type
- Return a delegation when reclaiming one that the server has
recalled
- Referrals should inherit proto setting from parents
- Make rpc_auth_create_args a const
- Improvements to rpc_iostats tracking
- Fix a potential reference leak when there is an error processing a
callback
- Fix rmdir / mkdir / rename nlink accounting
- Fix updating inode change attribute
- Fix error handling in nfsn4_sp4_select_mode()
- Use an appropriate work queue for direct-write completion
- Don't busy wait if NFSv4 session draining is interrupted"
* tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
pNFS: Remove unwanted optimisation of layoutget
pNFS/flexfiles: ff_layout_pg_init_read should exit on error
pNFS: Treat RECALLCONFLICT like DELAY...
pNFS: When updating the stateid in layoutreturn, also update the recall range
NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
NFSv4: Fix a typo in nfs4_init_channel_attrs()
NFSv4: Don't busy wait if NFSv4 session draining is interrupted
NFS recover from destination server reboot for copies
NFS add a simple sync nfs4_proc_commit after async COPY
NFS handle COPY ERR_OFFLOAD_NO_REQS
NFS send OFFLOAD_CANCEL when COPY killed
NFS export nfs4_async_handle_error
NFS handle COPY reply CB_OFFLOAD call race
NFS add support for asynchronous COPY
NFS COPY xdr handle async reply
NFS OFFLOAD_CANCEL xdr
NFS CB_OFFLOAD xdr
NFS: Use an appropriate work queue for direct-write completion
NFSv4: Fix error handling in nfs4_sp4_select_mode()
...
NFSv4.0 callback needs to know the GSS target name the client used
when it established its lease. That information is available from
the GSS context created by gssproxy. Make it available in each
svc_cred.
Note this will also give us access to the real target service
principal name (which is typically "nfs", but spec does not require
that).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).
I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.
There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.
svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.
The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events. On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.
The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure. After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.
The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210 ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The existing rpc_print_iostats has a few shortcomings. First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats. Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.
Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.
Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0
even though many ops would go over the wire. The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client. Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.
Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This turns rpc_auth_create_args into a const as it gets passed through the
auth stack.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a 1-byte stack overflow in nfs_idmap_read_and_verify_message
- Fix a hang due to incorrect error returns in rpcrdma_convert_iovs()
- Revert an incorrect change to the NFSv4.1 callback channel
- Fix a bug in the NFSv4.1 sequence error handling
Features and optimisations:
- Support for piggybacking a LAYOUTGET operation to the OPEN compound
- RDMA performance enhancements to deal with transport congestion
- Add proper SPDX tags for NetApp-contributed RDMA source
- Do not request delegated file attributes (size+change) from the
server
- Optimise away a GETATTR in the lookup revalidate code when doing
NFSv4 OPEN
- Optimise away unnecessary lookups for rename targets
- Misc performance improvements when freeing NFSv4 delegations
Bugfixes and cleanups:
- Try to fail quickly if proto=rdma
- Clean up RDMA receive trace points
- Fix sillyrename to return the delegation when appropriate
- Misc attribute revalidation fixes
- Immediately clear the pNFS layout on a file when the server returns
ESTALE
- Return NFS4ERR_DELAY when delegation/layout recalls fail due to
igrab()
- Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY"
* tag 'nfs-for-4.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (80 commits)
skip LAYOUTRETURN if layout is invalid
NFSv4.1: Fix the client behaviour on NFS4ERR_SEQ_FALSE_RETRY
NFSv4: Fix a typo in nfs41_sequence_process
NFSv4: Revert commit 5f83d86cf5 ("NFSv4.x: Fix wraparound issues..")
NFSv4: Return NFS4ERR_DELAY when a layout recall fails due to igrab()
NFSv4: Return NFS4ERR_DELAY when a delegation recall fails due to igrab()
NFSv4.0: Remove transport protocol name from non-UCS client ID
NFSv4.0: Remove cl_ipaddr from non-UCS client ID
NFSv4: Fix a compiler warning when CONFIG_NFS_V4_1 is undefined
NFS: Filter cache invalidation when holding a delegation
NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()
NFS: Improve caching while holding a delegation
NFS: Fix attribute revalidation
NFS: fix up nfs_setattr_update_inode
NFSv4: Ensure the inode is clean when we set a delegation
NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access
NFSv4: Don't ask for delegated attributes when adding a hard link
NFSv4: Don't ask for delegated attributes when revalidating the inode
NFS: Pass the inode down to the getattr() callback
NFSv4: Don't request size+change attribute if they are delegated to us
...
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle for nfsd.
The largest piece is an RDMA update from Chuck Lever with new trace
points, miscellaneous cleanups, and streamlining of the send and
receive paths.
Other than that, some miscellaneous bugfixes"
* tag 'nfsd-4.18' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd: fix error handling in nfs4_set_delegation()
nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo
Fix 16-byte memory leak in gssp_accept_sec_context_upcall
svcrdma: Fix incorrect return value/type in svc_rdma_post_recvs
svcrdma: Remove unused svc_rdma_op_ctxt
svcrdma: Persistently allocate and DMA-map Send buffers
svcrdma: Simplify svc_rdma_send()
svcrdma: Remove post_send_wr
svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt
svcrdma: Introduce svc_rdma_send_ctxt
svcrdma: Clean up Send SGE accounting
svcrdma: Refactor svc_rdma_dma_map_buf
svcrdma: Allocate recv_ctxt's on CPU handling Receives
svcrdma: Persistently allocate and DMA-map Receive buffers
svcrdma: Preserve Receive buffer until svc_rdma_sendto
svcrdma: Simplify svc_rdma_recv_ctxt_put
svcrdma: Remove sc_rq_depth
svcrdma: Introduce svc_rdma_recv_ctxt
svcrdma: Trace key RDMA API events
svcrdma: Trace key RPC/RDMA protocol events
...
While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
maps a separate buffer where the RPC/RDMA transport header is
constructed. The buffer is unmapped and released in the Send
completion handler. This is significant per-RPC overhead,
especially for small RPCs.
Instead, allocate and DMA-map a buffer, and cache it in each
svc_rdma_send_ctxt. This buffer and its mapping can be re-used
for each RPC, saving the cost of memory allocation and DMA
mapping.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
svc_rdma_post_send_wr is nearly empty.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Receive buffers are always the same size, but each Send WR has a
variable number of SGEs, based on the contents of the xdr_buf being
sent.
While assembling a Send WR, keep track of the number of SGEs so that
we don't exceed the device's maximum, or walk off the end of the
Send SGE array.
For now the Send path just fails if it exceeds the maximum.
The current logic in svc_rdma_accept bases the maximum number of
Send SGEs on the largest NFS request that can be sent or received.
In the transport layer, the limit is actually based on the
capabilities of the underlying device, not on properties of the
Upper Layer Protocol.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
Introduce a replacement to svc_rdma_op_ctxt's that is built
especially for the svcrdma Send path.
Subsequent patches will take advantage of this new structure by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.
I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and send_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.
Additional clean ups:
- Handle svc_rdma_send_ctxt_get allocation failure at each call
site, rather than pre-allocating and hoping we guessed correctly
- All send_ctxt_put call-sites request page freeing, so remove
the @free_pages argument
- All send_ctxt_put call-sites unmap SGEs, so fold that into
svc_rdma_send_ctxt_put
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Since there's already a svc_rdma_op_ctxt being passed
around with the running count of mapped SGEs, drop unneeded
parameters to svc_rdma_post_send_wr().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: svc_rdma_dma_map_buf does mostly the same thing as
svc_rdma_dma_map_page, so let's fold these together.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There is a significant latency penalty when processing an ingress
Receive if the Receive buffer resides in memory that is not on the
same NUMA node as the the CPU handling completions for a CQ.
The system administrator and the device driver determine which CPU
handles completions. This CPU does not change during life of the CQ.
Further the Upper Layer does not have any visibility of which CPU it
is.
Allocating Receive buffers in the Receive completion handler
guarantees that Receive buffers are allocated on the preferred NUMA
node for that CQ.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The current Receive path uses an array of pages which are allocated
and DMA mapped when each Receive WR is posted, and then handed off
to the upper layer in rqstp::rq_arg. The page flip releases unused
pages in the rq_pages pagelist. This mechanism introduces a
significant amount of overhead.
So instead, kmalloc the Receive buffer, and leave it DMA-mapped
while the transport remains connected. This confers a number of
benefits:
* Each Receive WR requires only one receive SGE, no matter how large
the inline threshold is. This helps the server-side NFS/RDMA
transport operate on less capable RDMA devices.
* The Receive buffer is left allocated and mapped all the time. This
relieves svc_rdma_post_recv from the overhead of allocating and
DMA-mapping a fresh buffer.
* svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
It has to DMA sync only the number of bytes that were received.
* svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
for each page in the Receive buffer, making it a constant-time
function.
* The Receive buffer is now plugged directly into the rq_arg's
head[0].iov_vec, and can be larger than a page without spilling
over into rq_arg's page list. This enables simplification of
the RDMA Read path in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently svc_rdma_recv_ctxt_put's callers have to know whether they
want to free the ctxt's pages or not. This means the human
developers have to know when and why to set that free_pages
argument.
Instead, the ctxt should carry that information with it so that
svc_rdma_recv_ctxt_put does the right thing no matter who is
calling.
We want to keep track of the number of pages in the Receive buffer
separately from the number of pages pulled over by RDMA Read. This
is so that the correct number of pages can be freed properly and
that number is well-documented.
So now, rc_hdr_count is the number of pages consumed by head[0]
(ie., the page index where the Read chunk should start); and
rc_page_count is always the number of pages that need to be released
when the ctxt is put.
The @free_pages argument is no longer needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
used only in svc_rdma_accept().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
To reduce contention further, separate the use of these objects in
the Receive and Send paths in svcrdma.
Subsequent patches will take advantage of this separation by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.
I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.
As a final clean up, helpers related to recv_ctxt are moved closer
to the functions that use them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This simplifies allocation of the generic RPC slot and xprtrdma
specific per-RPC resources.
It also makes xprtrdma more like the socket-based transports:
->buf_alloc and ->buf_free are now responsible only for send and
receive buffers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Refactor: xprtrdma needs to have better control over when RPCs are
awoken from the backlog queue, so replace xprt_free_slot with a
transport op callout.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
alloc_slot is a transport-specific op, but initializing an rpc_rqst
is common to all transports. In addition, the only part of initial-
izing an rpc_rqst that needs serialization is getting a fresh XID.
Move rpc_rqst initialization to common code in preparation for
adding a transport-specific alloc_slot to xprtrdma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- xprtrdma: Fix corner cases when handling device removal # v4.12+
- xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+
Features:
- New sunrpc tracepoint for RPC pings
- Finer grained NFSv4 attribute checking
- Don't unnecessarily return NFS v4 delegations
Other bugfixes and cleanups:
- Several other small NFSoRDMA cleanups
- Improvements to the sunrpc RTT measurements
- A few sunrpc tracepoint cleanups
- Various fixes for NFS v4 lock notifications
- Various sunrpc and NFS v4 XDR encoding cleanups
- Switch to the ida_simple API
- Fix NFSv4.1 exclusive create
- Forget acl cache after setattr operation
- Don't advance the nfs_entry readdir cookie if xdr decoding fails"
* tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits)
NFS: advance nfs_entry cookie only after decoding completes successfully
NFSv3/acl: forget acl cache after setattr
NFSv4.1: Fix exclusive create
NFSv4: Declare the size up to date after it was set.
nfs: Use ida_simple API
NFSv4: Fix the nfs_inode_set_delegation() arguments
NFSv4: Clean up CB_GETATTR encoding
NFSv4: Don't ask for attributes when ACCESS is protected by a delegation
NFSv4: Add a helper to encode/decode struct timespec
NFSv4: Clean up encode_attrs
NFSv4; Clean up XDR encoding of type bitmap4
NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group
SUNRPC: Add a helper for encoding opaque data inline
SUNRPC: Add helpers for decoding opaque and string types
NFSv4: Ignore change attribute invalidations if we hold a delegation
NFS: More fine grained attribute tracking
NFS: Don't force unnecessary cache invalidation in nfs_update_inode()
NFS: Don't redirty the attribute cache in nfs_wcc_update_inode()
NFS: Don't force a revalidation of all attributes if change is missing
NFS: Convert NFS_INO_INVALID flags to unsigned long
...
If recording xprt->stat.max_slots is moved into xprt_alloc_slot,
then xprt->num_reqs is never manipulated outside
xprt->reserve_lock. There's no longer a need for xprt->num_reqs to
be atomic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Since commit 33849792cb ("xprtrdma: Detect unreachable NFS/RDMA
servers more reliably"), the xprtrdma transport now has a ->timer
callout. But xprtrdma does not need to compute RTT data, only UDP
needs that. Move the xprt_update_rtt call into the UDP transport
implementation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
RPC-over-RDMA version 1 credit accounting relies on there being a
response message for every RPC Call. This means that RPC procedures
that have no reply will disrupt credit accounting, just in the same
way as a retransmit would (since it is sent because no reply has
arrived). Deal with the "no reply" case the same way.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Move common code in NFSD's legacy SYMLINK decoders into a helper.
The immediate benefits include:
- one fewer data copies on transports that support DDP
- consistent error checking across all versions
- reduction of code duplication
- support for both legal forms of SYMLINK requests on RDMA
transports for all versions of NFS (in particular, NFSv2, for
completeness)
In the long term, this helper is an appropriate spot to perform a
per-transport call-out to fill the pathname argument using, say,
RDMA Reads.
Filling the pathname in the proc function also means that eventually
the incoming filehandle can be interpreted so that filesystem-
specific memory can be allocated as a sink for the pathname
argument, rather than using anonymous pages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move common code in NFSD's legacy NFS WRITE decoders into a helper.
The immediate benefit is reduction of code duplication and some nice
micro-optimizations (see below).
In the long term, this helper can perform a per-transport call-out
to fill the rq_vec (say, using RDMA Reads).
The legacy WRITE decoders and procs are changed to work like NFSv4,
which constructs the rq_vec just before it is about to call
vfs_writev.
Why? Calling a transport call-out from the proc instead of the XDR
decoder means that the incoming FH can be resolved to a particular
filesystem and file. This would allow pages from the backing file to
be presented to the transport to be filled, rather than presenting
anonymous pages and copying or flipping them into the file's page
cache later.
I also prefer using the pages in rq_arg.pages, instead of pulling
the data pages directly out of the rqstp::rq_pages array. This is
currently the way the NFSv3 write decoder works, but the other two
do not seem to take this approach. Fixing this removes the only
reference to rq_pages found in NFSD, eliminating an NFSD assumption
about how transports use the pages in rq_pages.
Lastly, avoid setting up the first element of rq_vec as a zero-
length buffer. This happens with an RDMA transport when a normal
Read chunk is present because the data payload is in rq_arg's
page list (none of it is in the head buffer).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Record the time between when a rqstp is enqueued on a transport
and when it is dequeued. This includes how long the rqstp waits on
the queue and how long it takes the kernel scheduler to wake a
nfsd thread to service it.
The svc_xprt_dequeue trace point is altered to include the number
of microseconds between xprt_enqueue and xprt_dequeue.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Introduce a mechanism to report the server-side execution latency of
each RPC. The goal is to enable user space to filter the trace
record for latency outliers, build histograms, etc.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
TP_printk defines a format string that is passed to user space for
converting raw trace event records to something human-readable.
My user space's printf (Oracle Linux 7), however, does not have a
%pI format specifier. The result is that what is supposed to be an
IP address in the output of "trace-cmd report" is just a string that
says the field couldn't be displayed.
To fix this, adopt the same approach as the client: maintain a pre-
formated presentation address for occasions when %pI is not
available.
The location of the trace_svc_send trace point is adjusted so that
rqst->rq_xprt is not NULL when the trace event is recorded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Instead of returning a value that is used to set or clear
a bit, just make ->xpo_secure_port mangle that bit, and return void.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The target needs to return the lesser of the client's Inbound RDMA
Read Queue Depth (IRD), provided in the connection parameters, and
the local device's Outbound RDMA Read Queue Depth (ORD). The latter
limit is max_qp_init_rd_atom, not max_qp_rd_atom.
The svcrdma_ord value caps the ORD value for iWARP transports, which
do not exchange ORD/IRD values at connection time. Since no other
Linux kernel RDMA-enabled storage target sees fit to provide this
cap, I'm removing it here too.
initiator_depth is a u8, so ensure the computed ORD value does not
overflow that field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>