Commit Graph

682 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
3b301245dd Merge 4.14.256 into android-4.14-stable
Changes in 4.14.256
	xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
	binder: use euid from cred instead of using task
	binder: use cred instead of task for selinux checks
	Input: elantench - fix misreporting trackpoint coordinates
	Input: i8042 - Add quirk for Fujitsu Lifebook T725
	libata: fix read log timeout value
	ocfs2: fix data corruption on truncate
	mmc: dw_mmc: Dont wait for DRTO on Write RSP error
	parisc: Fix ptrace check on syscall return
	tpm: Check for integer overflow in tpm2_map_response_body()
	media: ite-cir: IR receiver stop working after receive overflow
	ALSA: ua101: fix division by zero at probe
	ALSA: 6fire: fix control and bulk message timeouts
	ALSA: line6: fix control and interrupt message timeouts
	ALSA: synth: missing check for possible NULL after the call to kstrdup
	ALSA: timer: Fix use-after-free problem
	ALSA: timer: Unconditionally unlink slave instances, too
	x86/irq: Ensure PI wakeup handler is unregistered before module unload
	cavium: Return negative value when pci_alloc_irq_vectors() fails
	scsi: qla2xxx: Fix unmap of already freed sgl
	cavium: Fix return values of the probe function
	sfc: Don't use netif_info before net_device setup
	hyperv/vmbus: include linux/bitops.h
	mmc: winbond: don't build on M68K
	bpf: Prevent increasing bpf_jit_limit above max
	xen/netfront: stop tx queues during live migration
	spi: spl022: fix Microwire full duplex mode
	watchdog: Fix OMAP watchdog early handling
	vmxnet3: do not stop tx queues after netif_device_detach()
	btrfs: fix lost error handling when replaying directory deletes
	hwmon: (pmbus/lm25066) Add offset coefficients
	regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
	regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
	EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
	mwifiex: fix division by zero in fw download path
	ath6kl: fix division by zero in send path
	ath6kl: fix control-message timeout
	ath10k: fix control-message timeout
	ath10k: fix division by zero in send path
	PCI: Mark Atheros QCA6174 to avoid bus reset
	rtl8187: fix control-message timeouts
	evm: mark evm_fixmode as __ro_after_init
	wcn36xx: Fix HT40 capability for 2Ghz band
	mwifiex: Read a PCI register after writing the TX ring write pointer
	libata: fix checking of DMA state
	wcn36xx: handle connection loss indication
	RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
	signal: Remove the bogus sigkill_pending in ptrace_stop
	signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
	power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
	power: supply: max17042_battery: use VFSOC for capacity when no rsns
	powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
	serial: core: Fix initializing and restoring termios speed
	ALSA: mixer: oss: Fix racy access to slots
	ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
	xen/balloon: add late_initcall_sync() for initial ballooning done
	PCI: aardvark: Do not clear status bits of masked interrupts
	PCI: aardvark: Do not unmask unused interrupts
	PCI: aardvark: Fix return value of MSI domain .alloc() method
	PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
	quota: check block number when reading the block in quota file
	quota: correct error number in free_dqentry()
	pinctrl: core: fix possible memory leak in pinctrl_enable()
	iio: dac: ad5446: Fix ad5622_write() return value
	USB: serial: keyspan: fix memleak on probe errors
	USB: iowarrior: fix control-message timeouts
	Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
	Bluetooth: fix use-after-free error in lock_sock_nested()
	platform/x86: wmi: do not fail if disabling fails
	MIPS: lantiq: dma: add small delay after reset
	MIPS: lantiq: dma: reset correct number of channel
	locking/lockdep: Avoid RCU-induced noinstr fail
	smackfs: Fix use-after-free in netlbl_catmap_walk()
	x86: Increase exception stack sizes
	mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
	mwifiex: Properly initialize private structure on interface type changes
	media: mt9p031: Fix corrupted frame after restarting stream
	media: netup_unidvb: handle interrupt properly according to the firmware
	media: uvcvideo: Set capability in s_param
	media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
	media: s5p-mfc: Add checking to s5p_mfc_probe().
	media: mceusb: return without resubmitting URB in case of -EPROTO error.
	ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
	ACPICA: Avoid evaluating methods too early during system resume
	media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
	tracefs: Have tracefs directories not set OTH permission bits by default
	ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
	ACPI: battery: Accept charges over the design capacity as full
	leaking_addresses: Always print a trailing newline
	memstick: r592: Fix a UAF bug when removing the driver
	lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
	lib/xz: Validate the value before assigning it to an enum variable
	tracing/cfi: Fix cmp_entries_* functions signature mismatch
	mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
	PM: hibernate: Get block device exclusively in swsusp_check()
	iwlwifi: mvm: disable RX-diversity in powersave
	smackfs: use __GFP_NOFAIL for smk_cipso_doi()
	ARM: clang: Do not rely on lr register for stacktrace
	gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
	ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
	spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
	parisc: fix warning in flush_tlb_all
	task_stack: Fix end_of_stack() for architectures with upwards-growing stack
	parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
	cgroup: Make rebind_subsystems() disable v2 controllers all at once
	media: dvb-usb: fix ununit-value in az6027_rc_query
	media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
	media: si470x: Avoid card name truncation
	media: cx23885: Fix snd_card_free call on null card pointer
	cpuidle: Fix kobject memory leaks in error paths
	ath9k: Fix potential interrupt storm on queue reset
	crypto: qat - detect PFVF collision after ACK
	crypto: qat - disregard spurious PFVF interrupts
	hwrng: mtk - Force runtime pm ops for sleep ops
	b43legacy: fix a lower bounds test
	b43: fix a lower bounds test
	memstick: avoid out-of-range warning
	memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
	hwmon: Fix possible memleak in __hwmon_device_register()
	ath10k: fix max antenna gain unit
	drm/msm: uninitialized variable in msm_gem_import()
	net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
	mmc: mxs-mmc: disable regulator on error and in the remove function
	platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
	mwifiex: Send DELBA requests according to spec
	phy: micrel: ksz8041nl: do not use power down mode
	PM: hibernate: fix sparse warnings
	smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
	s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
	irq: mips: avoid nested irq_enter()
	samples/kretprobes: Fix return value if register_kretprobe() failed
	libertas_tf: Fix possible memory leak in probe and disconnect
	libertas: Fix possible memory leak in probe and disconnect
	net: amd-xgbe: Toggle PLL settings during rate change
	net: phylink: avoid mvneta warning when setting pause parameters
	crypto: pcrypt - Delay write to padata->info
	ibmvnic: Process crqs after enabling interrupts
	RDMA/rxe: Fix wrong port_cap_flags
	ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
	ARM: dts: at91: tse850: the emac<->phy interface is rmii
	scsi: dc395: Fix error case unwinding
	MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
	JFS: fix memleak in jfs_mount
	ALSA: hda: Reduce udelay() at SKL+ position reporting
	arm: dts: omap3-gta04a4: accelerometer irq fix
	soc/tegra: Fix an error handling path in tegra_powergate_power_up()
	memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
	video: fbdev: chipsfb: use memset_io() instead of memset()
	serial: 8250_dw: Drop wrong use of ACPI_PTR()
	usb: gadget: hid: fix error code in do_config()
	power: supply: rt5033_battery: Change voltage values to µV
	scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
	RDMA/mlx4: Return missed an error if device doesn't support steering
	ASoC: cs42l42: Correct some register default values
	ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
	serial: xilinx_uartps: Fix race condition causing stuck TX
	mips: cm: Convert to bitfield API to fix out-of-bounds access
	power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
	apparmor: fix error check
	rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
	pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
	drm/plane-helper: fix uninitialized variable reference
	PCI: aardvark: Don't spam about PIO Response Status
	NFS: Fix deadlocks in nfs_scan_commit_list()
	fs: orangefs: fix error return code of orangefs_revalidate_lookup()
	mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
	dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
	auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
	auxdisplay: ht16k33: Connect backlight to fbdev
	auxdisplay: ht16k33: Fix frame buffer device blanking
	netfilter: nfnetlink_queue: fix OOB when mac header was cleared
	dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
	m68k: set a default value for MEMORY_RESERVE
	watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
	ar7: fix kernel builds for compiler test
	scsi: qla2xxx: Turn off target reset during issue_lip
	i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
	xen-pciback: Fix return in pm_ctrl_init()
	net: davinci_emac: Fix interrupt pacing disable
	ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
	bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
	llc: fix out-of-bound array index in llc_sk_dev_hash()
	nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
	vsock: prevent unnecessary refcnt inc for nonblocking connect
	USB: chipidea: fix interrupt deadlock
	ARM: 9155/1: fix early early_iounmap()
	ARM: 9156/1: drop cc-option fallbacks for architecture selection
	powerpc/lib: Add helper to check if offset is within conditional branch range
	powerpc/bpf: Validate branch ranges
	powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
	mm, oom: do not trigger out_of_memory from the #PF
	s390/cio: check the subchannel validity for dev_busid
	PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
	ext4: fix lazy initialization next schedule time computation in more granular unit
	tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT
	parisc/entry: fix trace test in syscall exit path
	PCI/MSI: Destroy sysfs before freeing entries
	arm64: zynqmp: Fix serial compatible string
	scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
	usb: musb: tusb6010: check return value after calling platform_get_resource()
	scsi: advansys: Fix kernel pointer leak
	ARM: dts: omap: fix gpmc,mux-add-data type
	usb: host: ohci-tmio: check return value after calling platform_get_resource()
	tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
	MIPS: sni: Fix the build
	scsi: target: Fix ordered tag handling
	scsi: target: Fix alua_tg_pt_gps_count tracking
	powerpc/5200: dts: fix memory node unit name
	ALSA: gus: fix null pointer dereference on pointer block
	powerpc/dcr: Use cmplwi instead of 3-argument cmpli
	sh: check return code of request_irq
	maple: fix wrong return value of maple_bus_init().
	sh: fix kconfig unmet dependency warning for FRAME_POINTER
	sh: define __BIG_ENDIAN for math-emu
	mips: BCM63XX: ensure that CPU_SUPPORTS_32BIT_KERNEL is set
	sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
	net: bnx2x: fix variable dereferenced before check
	iavf: Fix for the false positive ASQ/ARQ errors while issuing VF reset
	MIPS: generic/yamon-dt: fix uninitialized variable error
	mips: bcm63xx: add support for clk_get_parent()
	mips: lantiq: add support for clk_get_parent()
	platform/x86: hp_accel: Fix an error handling path in 'lis3lv02d_probe()'
	net: virtio_net_hdr_to_skb: count transport header in UFO
	i40e: Fix NULL ptr dereference on VSI filter sync
	NFC: reorganize the functions in nci_request
	NFC: reorder the logic in nfc_{un,}register_device
	perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake Server
	perf/x86/intel/uncore: Fix IIO event constraints for Skylake Server
	tun: fix bonding active backup with arp monitoring
	hexagon: export raw I/O routines for modules
	mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
	btrfs: fix memory ordering between normal and ordered work functions
	parisc/sticon: fix reverse colors
	cfg80211: call cfg80211_stop_ap when switch from P2P_GO type
	drm/udl: fix control-message timeout
	drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga and dvi connectors
	perf/core: Avoid put_page() when GUP fails
	batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LAN
	batman-adv: mcast: fix duplicate mcast packets from BLA backbone to mesh
	batman-adv: Consider fragmentation for needed_headroom
	batman-adv: Reserve needed_*room for fragments
	batman-adv: Don't always reallocate the fragmentation skb head
	RDMA/netlink: Add __maybe_unused to static inline in C file
	ASoC: DAPM: Cover regression by kctl change notification fix
	usb: max-3421: Use driver data instead of maintaining a list of bound devices
	soc/tegra: pmc: Fix imbalanced clock disabling in error code path
	Linux 4.14.256

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I32f0b43f5aa192eda1aa3a220a2f348ade0536d2
2021-12-16 11:36:23 +00:00
Nathan Chancellor
df0fa028d4 IB/mlx4: Avoid implicit enumerated type conversion
[ Upstream commit b56511c15713ba6c7572e77a41f7ddba9c1053ec ]

Clang warns when one enumerated type is implicitly converted to another.

drivers/infiniband/hw/mlx4/mad.c:1811:41: warning: implicit conversion
from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration
type 'enum ib_qp_create_flags' [-Wenum-conversion]
                qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_TUNNEL_QP;
                                                    ~ ^~~~~~~~~~~~~~~~~~~~~~~

drivers/infiniband/hw/mlx4/mad.c:1819:41: warning: implicit conversion
from enumeration type 'enum mlx4_ib_qp_flags' to different enumeration
type 'enum ib_qp_create_flags' [-Wenum-conversion]
                qp_init_attr.init_attr.create_flags = MLX4_IB_SRIOV_SQP;
                                                    ~ ^~~~~~~~~~~~~~~~~

The type mlx4_ib_qp_flags explicitly provides supplemental values to the
type ib_qp_create_flags. Make that clear to Clang by changing the
create_flags type to u32.

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-24 08:23:05 +01:00
Dan Carpenter
8466db9543 RDMA/uverbs: Prevent potential underflow
[ Upstream commit a9018adfde809d44e71189b984fa61cc89682b5e ]

The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the
UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function.  We check that:

        if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) {

But we don't check if "attr.comp_vector" is negative.  It could
potentially lead to an array underflow.  My concern would be where
cq->vector is used in the create_cq() function from the cxgb4 driver.

And really "attr.comp_vector" is appears as a u32 to user space so that's
the right type to use.

Fixes: 9ee79fce36 ("IB/core: Add completion queue (cq) object actions")
Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-11-12 19:18:36 +01:00
Jack Morgenstein
51b51a4c6b IB/core: Add an unbound WQ type to the new CQ API
commit f794809a7259dfaa3d47d90ef5a86007cf48b1ce upstream.

The upstream kernel commit cited below modified the workqueue in the
new CQ API to be bound to a specific CPU (instead of being unbound).
This caused ALL users of the new CQ API to use the same bound WQ.

Specifically, MAD handling was severely delayed when the CPU bound
to the WQ was busy handling (higher priority) interrupts.

This caused a delay in the MAD "heartbeat" response handling,
which resulted in ports being incorrectly classified as "down".

To fix this, add a new "unbound" WQ type to the new CQ API, so that users
have the option to choose either a bound WQ or an unbound WQ.

For MADs, choose the new "unbound" WQ.

Fixes: b7363e67b2 ("IB/device: Convert ib-comp-wq to be CPU-bound")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.m>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-10-05 12:47:32 +02:00
Jason Gunthorpe
19278c4470 IB/rxe: Revise the ib_wr_opcode enum
[ Upstream commit 9a59739bd01f77db6fbe2955a4fce165f0f43568 ]

This enum has become part of the uABI, as both RXE and the
ib_uverbs_post_send() command expect userspace to supply values from this
enum. So it should be properly placed in include/uapi/rdma.

In userspace this enum is called 'enum ibv_wr_opcode' as part of
libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
turns out) into libiberbs in 2015.

The kernel has changed its mind on the numbering for several of the IB_WC
values over the years, but has remained stable on IB_WR_LOCAL_INV and
below.

Based on this we can conclude that there is no real user space user of the
values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
and implements the latter operations. rxe has clearly never worked with
these attributes from userspace. Other drivers that support these opcodes
implement the functionality without calling out to the kernel.

To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
choose to renumber the IB_WR enum in the kernel to match the uABI that
userspace has bee using since before Soft RoCE was merged. This is an
overall simpler configuration for the whole software stack, and obviously
can't break anything existing.

Reported-by: Seth Howell <seth.howell@intel.com>
Tested-by: Seth Howell <seth.howell@intel.com>
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
2019-05-16 19:42:25 +02:00
Sebastian Sanchez
964705c4a6 IB/hfi1: Optimize kthread pointer locking when queuing CQ entries
commit af8aab71370a692eaf7e7969ba5b1a455ac20113 upstream.

All threads queuing CQ entries on different CQs are unnecessarily
synchronized by a spin lock to check if the CQ kthread worker hasn't
been destroyed before queuing an CQ entry.

The lock used in 6efaf10f16 ("IB/rdmavt: Avoid queuing work into a
destroyed cq kthread worker") is a device global lock and will have
poor performance at scale as completions are entered from a large
number of CPUs.

Convert to use RCU where the read side of RCU is rvt_cq_enter() to
determine that the worker is alive prior to triggering the
completion event.
Apply write side RCU semantics in rvt_driver_cq_init() and
rvt_cq_exit().

Fixes: 6efaf10f16 ("IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker")
Cc: <stable@vger.kernel.org> # 4.14.x
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:24:54 +02:00
Jack Morgenstein
96fb9b8838 IB/core: Make testing MR flags for writability a static inline function
commit 08bb558ac11ab944e0539e78619d7b4c356278bd upstream.

Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h

This allows the function to be used by low-level infiniband drivers.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:24:53 +02:00
Lidong Chen
a59bd81957 IB/umem: Use the correct mm during ib_umem_release
commit 8e907ed4882714fd13cfe670681fc6cb5284c780 upstream.

User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.

Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.

Cc: <stable@vger.kernel.org>
Fixes: 87773dd56d ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:51:49 +02:00
Don Hiatt
fd370b8e65 IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
[ Upstream commit 87daac68f77a3e21a1113f816e6a7be0b38bdde8 ]

iWarp devices do not support the creation of address handles
so return AH_ATTR_TYPE_UNDEFINED for all iWarp devices.

While we are here reduce the size of port_num to u8 and add
a comment.

Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Reported-by: Parav Pandit <parav@mellanox.com>
CC: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-26 11:02:15 +02:00
Parav Pandit
b7b27e19e3 RDMA/core: Clarify rdma_ah_find_type
[ Upstream commit a6532e7139660c103dda181aa5b2c734aa26ed6c ]

iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
ports and for clarity it shouldn't have a special test for iWarp.

This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
when wrongly called on an iWarp port.

Fixes: 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-26 11:02:04 +02:00
Roland Dreier
b0d95e686f RDMA/ucma: Introduce safer rdma_addr_size() variants
commit 84652aefb347297aa08e91e283adf7b18f77c2d5 upstream.

There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB.  When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.

Fix this by introducing new variants

    int rdma_addr_size_in6(struct sockaddr_in6 *addr);
    int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);

that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in.  We can use
these new variants to check what size userspace has passed in before
copying any addresses.

Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08 14:26:29 +02:00
Bodong Wang
d40ad86570 IB/core: Fix ib_wc structure size to remain in 64 bytes boundary
commit cd2a6e7d384b043d5d029e39663061cebc949385 upstream.

The change of slid from u16 to u32 results in sizeof(struct ib_wc)
cross 64B boundary, which causes more cache misses. This patch
rearranges the fields and remain the size to 64B.

Pahole output before this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        int                        wc_flags;             /*    40     4 */
        u16                        pkey_index;           /*    44     2 */

        /* XXX 2 bytes hole, try to pack */

        u32                        slid;                 /*    48     4 */
        u8                         sl;                   /*    52     1 */
        u8                         dlid_path_bits;       /*    53     1 */
        u8                         port_num;             /*    54     1 */
        u8                         smac[6];              /*    55     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    62     2 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u8                         network_hdr_type;     /*    64     1 */

        /* size: 72, cachelines: 2, members: 17 */
        /* sum members: 62, holes: 2, sum holes: 3 */
        /* padding: 7 */
        /* last cacheline: 8 bytes */
};

Pahole output after this change:

struct ib_wc {
        union {
                u64                wr_id;                /*           8 */
                struct ib_cqe *    wr_cqe;               /*           8 */
        };                                               /*     0     8 */
        enum ib_wc_status          status;               /*     8     4 */
        enum ib_wc_opcode          opcode;               /*    12     4 */
        u32                        vendor_err;           /*    16     4 */
        u32                        byte_len;             /*    20     4 */
        struct ib_qp *             qp;                   /*    24     8 */
        union {
                __be32             imm_data;             /*           4 */
                u32                invalidate_rkey;      /*           4 */
        } ex;                                            /*    32     4 */
        u32                        src_qp;               /*    36     4 */
        u32                        slid;                 /*    40     4 */
        int                        wc_flags;             /*    44     4 */
        u16                        pkey_index;           /*    48     2 */
        u8                         sl;                   /*    50     1 */
        u8                         dlid_path_bits;       /*    51     1 */
        u8                         port_num;             /*    52     1 */
        u8                         smac[6];              /*    53     6 */

        /* XXX 1 byte hole, try to pack */

        u16                        vlan_id;              /*    60     2 */
        u8                         network_hdr_type;     /*    62     1 */

        /* size: 64, cachelines: 1, members: 17 */
        /* sum members: 62, holes: 1, sum holes: 1 */
        /* padding: 1 */
};

Fixes: 7db20ecd1d ("IB/core: Change wc.slid from 16 to 32 bits")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-22 15:42:14 +01:00
Bart Van Assche
9fc290e529 IB/core: Fix endianness annotation in rdma_is_multicast_addr()
[ Upstream commit 1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec ]

Since ipv4_addr is a big endian 32-bit number, annotate it as such.

Fixes: commit be1d325a33 ("IB/core: Set RoCEv2 MGID according to spec")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:10:36 +01:00
Parav Pandit
b57259ca05 IB/core: Fix calculation of maximum RoCE MTU
[ Upstream commit 99260132fde7bddc6e0132ce53da94d1c9ccabcb ]

The original code only took into consideration the largest header
possible after the IB_BTH_BYTES.  This was incorrect, as the largest
possible header size is the largest possible combination of headers we
might run into.  The new code accounts for all possible headers in the
largest possible combination and subtracts that from the MTU to make
sure that all packets will fit on the wire.

Link: https://www.spinics.net/lists/linux-rdma/msg54558.html
Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:10:35 +01:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Parav Pandit
edd3155114 IB: Correct MR length field to be 64-bit
The ib_mr->length represents the length of the MR in bytes as per
the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION).

Currently ib_mr->length field is defined as only 32-bits field.
This might result into truncation and failed WRs of consumers who
registers more than 4GB bytes memory regions and whose WRs accessing
such MRs.

This patch makes the length 64-bit to avoid such truncation.

Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Fixes: 4c67e2bfc8 ("IB/core: Introduce new fast registration API")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Leon Romanovsky
78b1beb099 IB/core: Fix typo in the name of the tag-matching cap struct
The tag matching functionality is implemented by mlx5 driver
by extending XRQ, however this internal kernel information was
exposed to user space applications with *xrq* name instead of *tm*.

This patch renames *xrq* to *tm* to handle that.

Fixes: 8d50505ada ("IB/uverbs: Expose XRQ capabilities")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Linus Torvalds
ad9a19d003 Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "More RDMA work and some op-structure constification from Chuck Lever,
  and a small cleanup to our xdr encoding"

* tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Estimate Send Queue depth properly
  rdma core: Add rdma_rw_mr_payload()
  svcrdma: Limit RQ depth
  svcrdma: Populate tail iovec when receiving
  nfsd: Incoming xdr_bufs may have content in tail buffer
  svcrdma: Clean up svc_rdma_build_read_chunk()
  sunrpc: Const-ify struct sv_serv_ops
  nfsd: Const-ify NFSv4 encoding and decoding ops arrays
  sunrpc: Const-ify instances of struct svc_xprt_ops
  nfsd4: individual encoders no longer see error cases
  nfsd4: skip encoder in trivial error cases
  nfsd4: define ->op_release for compound ops
  nfsd4: opdesc will be useful outside nfs4proc.c
  nfsd4: move some nfsd4 op definitions to xdr4.h
2017-09-09 13:31:49 -07:00
Davidlohr Bueso
f808c13fd3 lib/interval_tree: fast overlap detection
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
  Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00
Chuck Lever
0062818298 rdma core: Add rdma_rw_mr_payload()
The amount of payload per MR depends on device capabilities and
the memory registration mode in use. The new rdma_rw API hides both,
making it difficult for ULPs to determine how large their transport
send queues need to be.

Expose the MR payload information via a new API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05 15:15:30 -04:00
Matan Barak
5242711294 IB/core: Assign root to all drivers
In order to use the parsing tree, we need to assign the root
to all drivers. Currently, we just assign the default parsing
tree via ib_uverbs_add_one. The driver could override this by
assigning a parsing tree prior to registering the device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:14 -04:00
Matan Barak
d70724f149 IB/core: Add legacy driver's user-data
In this phase, we don't want to change all the drivers to use
flexible driver's specific attributes. Therefore, we add two default
attributes: UHW_IN and UHW_OUT. These attributes are optional in some
methods and they encode the driver specific command data. We add
a function that extract this data and creates the legacy udata over
it.

Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
turns on the first bit of the namespace, indicating this attribute
belongs to the driver's namespace.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:13 -04:00
Matan Barak
64b19e1323 IB/core: Export ioctl enum types to user-space
Add a new ib_user_ioctl_verbs.h which exports all required ABI
enums and structs to the user-space.
Export the default types to user-space through this file.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:12 -04:00
Matan Barak
4da70da23e IB/core: Explicitly destroy an object while keeping uobject
When some objects are destroyed, we need to extract their status at
destruction. After object's destruction, this status
(e.g. events_reported) relies in the uobject. In order to have the
latest and correct status, the underlying object should be destroyed,
but we should keep the uobject alive and read this information off the
uobject. We introduce a rdma_explicit_destroy function. This function
destroys the class type object (for example, the IDR class type which
destroys the underlying object as well) and then convert the uobject
to be of a null class type. This uobject will then be destroyed as any
other uobject once uverbs_finalize_object[s] is called.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:11 -04:00
Matan Barak
3541030650 IB/core: Add macros for declaring methods and attributes
This patch adds macros for declaring objects, methods and
attributes. These definitions are later used by downstream patches
to declare some of the default types.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:11 -04:00
Matan Barak
118620d368 IB/core: Add uverbs merge trees functionality
Different drivers support different features and even subset of the
common uverbs implementation. Currently, this is handled as bitmask
in every driver that represents which kind of methods it supports, but
doesn't go down to attributes granularity. Moreover, drivers might
want to add their specific types, methods and attributes to let
their user-space counter-parts be exposed to some more efficient
abstractions. It means that existence of different features is
validated syntactically via the parsing infrastructure rather than
using a complex in-handler logic.

In order to do that, we allow defining features and abstractions
as parsing trees. These per-feature parsing tree could be merged
to an efficient (perfect-hash based) parsing tree, which is later
used by the parsing infrastructure.

To sum it up, this makes a parse tree unique for a device and
represents only the features this particular device supports.
This is done by having a root specification tree per feature.
Before a device registers itself as an IB device, it merges
all these trees into one parsing tree. This parsing tree
is used to parse all user-space commands.

A future user-space application could read this parse tree. This
tree represents which objects, methods and attributes are
supported by this device.

This is based on the idea of
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:10 -04:00
Matan Barak
09e3ebf8c1 IB/core: Add DEVICE object and root tree structure
This adds the DEVICE object. This object supports creating the context
that all objects are created from. Moreover, it supports executing
methods which are related to the device itself, such as QUERY_DEVICE.
This is a singleton object (per file instance).

All standard objects are put in the root structure. This root will later
on be used in drivers as the source for their whole parsing tree.
Later on, when new features are added, these drivers could mix this root
with other customized objects.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:10 -04:00
Matan Barak
5009010fbf IB/core: Declare an object instead of declaring only type attributes
Switch all uverbs_type_attrs_xxxx with DECLARE_UVERBS_OBJECT
macros. This will be later used in order to embed the object
specific methods in the objects as well.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:09 -04:00
Matan Barak
fac9658cab IB/core: Add new ioctl interface
In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.

For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.

Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.

When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.

Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR

If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.

 objects
+--------+
|        |
|        |   methods                                                                +--------+
|        |   ns         method      method_spec                           +-----+   |len     |
+--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
| object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

[d] = Hash ids to groups using the high order bits

The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
               struct uverbs_attr_bundle *ctx);

ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core:                      +---> | valid      |
+----------------------------+     | cmd_attr   |
| driver:                    |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  drivers
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:09 -04:00
Matan Barak
f43dbebfa3 IB/core: Add support to finalize objects in one transaction
The new ioctl based infrastructure either commits or rollbacks
all objects of the method as one transaction. In order to do
that, we introduce a notion of dealing with a collection of
objects that are related to a specific method.

This also requires adding a notion of a method and attribute.
A method contains a hash of attributes, where each bucket
contains several attributes. The attributes are hashed according
to their namespace which resides in the four upper bits of the id.

For example, an object could be a CQ, which has an action of CREATE_CQ.
This action has multiple attributes. For example, the CQ's new handle
and the comp_channel. Each layer in this hierarchy - objects, methods
and attributes is split into namespaces. The basic example for that is
one namespace representing the default entities and another one
representing the driver specific entities.

When declaring these methods and attributes, we actually declare
their specifications. When a method is executed, we actually
allocates some space to hold auxiliary information. This auxiliary
information contains meta-data about the required objects, such
as pointers to their type information, pointers to the uobjects
themselves (if exist), etc.
The specification, along with the auxiliary information we allocated
and filled is given to the finalize_objects function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30 10:30:38 -04:00
Matan Barak
a0aa309c39 IB/core: Add a generic way to execute an operation on a uobject
The ioctl infrastructure treats all user-objects in the same manner.
It gets objects ids from the user-space and by using the object type
and type attributes mentioned in the object specification, it executes
this required method. Passing an object id from the user-space as
an attribute is carried out in three stages. The first is carried out
before the actual handler and the last is carried out afterwards.

The different supported operations are read, write, destroy and create.
In the first stage, the former three actions just fetches the object
from the repository (by using its id) and locks it. The last action
allocates a new uobject. Afterwards, the second stage is carried out
when the handler itself carries out the required modification of the
object. The last stage is carried out after the handler finishes and
commits the result. The former two operations just unlock the object.
Destroy calls the "free object" operation, taking into account the
object's type and releases the uobject as well. Creation just adds the
new uobject to the repository, making the object visible to the
application.

In order to abstract these details from the ioctl infrastructure
layer, we add uverbs_get_uobject_from_context and
uverbs_finalize_object functions which corresponds to the first
and last stages respectively.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30 10:30:38 -04:00
Artemy Kovalyov
9c2c849625 IB/core: Add new SRQ type IB_SRQT_TM
This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag
matching and rendezvous offloads for MPI applications.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.
In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:17 -04:00
Artemy Kovalyov
1a56ff6daa IB/core: Separate CQ handle in SRQ context
Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:16 -04:00
Artemy Kovalyov
6938fc1ee0 IB/core: Add XRQ capabilities
This patch adds following TM XRQ capabilities:

* max_rndv_hdr_size - Max size of rendezvous request message
* max_num_tags - Max number of entries in tag matching list
* max_ops - Max number of outstanding list operations
* max_sge - Max number of SGE in tag matching entry
* flags - the following flags are currently defined:
    - IB_TM_CAP_RC - Support tag matching on RC transport

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:16 -04:00
Mike Marciniszyn
0208da90de IB/rdmavt: Handle dereg of inuse MRs properly
A destroy of an MR prior to destroying the QP can cause the following
diagnostic if the QP is referencing the MR being de-registered:

hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108
              00 pd ffff880859b20b00

The solution is to when the a non-zero refcount is encountered when
the MR is destroyed the QPs needs to be iterated looking for QPs in
the same PD as the MR.  If rvt_qp_mr_clean() detects any such QP
references the rkey/lkey, the QP needs to be put into an error state
via a call to rvt_qp_error() which will trigger the clean up of any
stuck references.

This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5.

[This is reproduced with the 0.4.9 version of qperf and the rc_bw test]

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:31 -04:00
Mike Marciniszyn
4734b4f417 IB/rdmavt: Add QP iterator API for QPs
There are currently 3 spots in the qib and hfi1 driver that have
knowledge of the internal QP hash list that should only be in
scope to rdmavt QP code.

Add an iterator API for processing all QPs to hide the
nature of the RCU hashlist.

The API consists of:
- rvt_qp_iter_init()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter_next()
  * For iterating QPs one at a time for seq_file semantics
- rvt_qp_iter()
  * For iterating all QPs

The first two are used for things like seq_file prints.

The last is for code that just needs to iterate all QPs
in the system.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28 19:12:28 -04:00
Leon Romanovsky
78b57f9529 RDMA/core: Cleanup device capability enum
Cleanup patch prior exporting the ib_device_cap_flags
to the user space. In this patch, we are aligning the
indentation, removing IB_DEVICE_INIT_TYPE and IB_DEVICE_RESERVED
fields, because it is not used in the kernel.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Leon Romanovsky
dcc9881e67 RDMA/(core, ulp): Convert register/unregister event handler to be void
The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.

Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Doug Ledford
732912c738 Merge branch 'k.o/for-4.13-rc' into k.o/for-next
Pick up -rc fixes.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:58:26 -04:00
Noa Osherovich
498ca3c82a IB/core: Avoid accessing non-allocated memory when inferring port type
Commit 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
 * During ib_register_device, each port is checked for its type which
   is stored in ib_device's port_immutable array.
 * During uverbs' modify_qp, the type is inferred using the port number
   in ib_uverbs_qp_dest struct (address vector) by accessing the
   relevant port_immutable array and the type is passed on to
   providers.

IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.

Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.

Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.

Fixes: 44c58487d5 ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:33:33 -04:00
Jason Gunthorpe
e3bf14bdc1 rdma: Autoload netlink client modules
If a message comes in and we do not have the client in the table, then
try to load the module supplying that client using MODULE_ALIAS to find
it.

This duplicates the scheme seen in other netlink muxes (eg nfnetlink).

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 17:04:22 -04:00
Kamenee Arumugame
ec0d8b8a63 IB/hfi1: Stricter bounds checking of MAD trap index
The macro size is valid. This change makes it less ambiguous.
Bounds check trap type for better security.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Dasaratharaman Chandramouli
51e658f5dd IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lids
Increase lid used in hfi1 driver to 32 bits. qib continues
to use 16 bit lids.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt
88733e3b84 IB/hfi1: Add 16B UD support
Add 16B bypass packet support for UD traffic types.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt
d98bb7f7e6 IB/hfi1: Determine 9B/16B L2 header type based on Address handle
When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt
72c07e2b67 IB/hfi1: Add support to receive 16B bypass packets
We introduce a struct hfi1_16b_header to support 16B headers.
16B bypass packets are received by the driver and processed
similar to 9B packets. Add basic support to handle 16B packets.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Don Hiatt
13c1922288 IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDs
rvt_check_ah() delegates lid verification to underlying
driver. Underlying driver uses different conditions to
check for dlid depending on whether the device supports
extended LIDs

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Sebastian Sanchez
16570d3da0 IB/hfi1: Remove pmtu from the QP structure
The pmtu field doens't have be stored in the QP structure
as it can easily be calculated when needed.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:36 -04:00
Hiatt, Don
62ede77799 Add OPA extended LID support
This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:47:37 -04:00
Doug Ledford
d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00