commit 77e89afc25f30abd56e76a809ee2884d7c1b63ce upstream.
Multi-MSI uses a single MSI descriptor and there is a single mask register
when the device supports per vector masking. To avoid reading back the mask
register the value is cached in the MSI descriptor and updates are done by
clearing and setting bits in the cache and writing it to the device.
But nothing protects msi_desc::masked and the mask register from being
modified concurrently on two different CPUs for two different Linux
interrupts which belong to the same multi-MSI descriptor.
Add a lock to struct device and protect any operation on the mask and the
mask register with it.
This makes the update of msi_desc::masked unconditional, but there is no
place which requires a modification of the hardware register without
updating the masked cache.
msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
up changes.
The problem goes way back to the initial support of multi-MSI, but picking
the commit which introduced the mask cache is a valid cut off point
(2.6.30).
Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2f44e29cef006a4b0a4ecf7d4c5aac7d0fbb505c upstream.
A bug fix to the MSIx handling in vfio added references to functions
that may not be defined if MSI is disabled in the kernel, resulting in
this link error:
drivers/built-in.o: In function `vfio_msi_set_vector_signal':
:(.text+0x450808): undefined reference to `get_cached_msi_msg'
:(.text+0x45080c): undefined reference to `write_msi_msg'
As suggested by Alex Williamson, add stub implementations for
get_cached_msi_msg() and pci_write_msi_msg().
In case this bugfix gets backported, please note that the #ifdef
has changed over time, originally both functions were implemented
in drivers/pci/msi.c and controlled by CONFIG_PCI_MSI, while nowadays
get_cached_msi_msg() is part of the generic MSI support and can be
used without PCI.
Fixes: b8f02af096 ("vfio/pci: Restore MSIx message prior to enabling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Link: http://lkml.kernel.org/r/1413190208.4202.34.camel@ul30vt.home
Link: http://lkml.kernel.org/r/20170214215343.3307861-1-arnd@arndb.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f3b0946d629c8bfbd3e5f038e30cb9c711a35f10 upstream.
Bharat Kumar Gogada reported issues with the generic MSI code, where the
end-point ended up with garbage in its MSI configuration (both for the vector
and the message).
It turns out that the two MSI paths in the kernel are doing slightly different
things:
generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
And it turns out that end-points are allowed to latch the content of the MSI
configuration registers as soon as MSIs are enabled. In Bharat's case, the
end-point ends up using whatever was there already, which is not what you
want.
In order to make things converge, we introduce a new MSI domain flag
(MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
this flag forces the programming of the end-point as soon as the MSIs are
allocated.
A consequence of this is that we have an extra activate in irq_startup, but
that should be without much consequence.
tglx:
- Several people reported a VMWare regression with PCI/MSI-X passthrough. It
turns out that the patch also cures that issue.
- We need to have a look at the MSI disable interrupt path, where we write
the msg to all zeros without disabling MSI in the PCI device. Is that
correct?
Fixes: 52f518a3a7 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
Reported-and-tested-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Reported-and-tested-by: Foster Snowhill <forst@forstwoof.ru>
Reported-by: Matthias Prager <linux@matthiasprager.de>
Reported-by: Jason Taylor <jason.taylor@simplivity.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
So far, we've always considered that for a given PCI device, its
MSI controller was either set by the architecture-specific
pcibios hook, or simply inherited from the host bridge.
This doesn't cover things like firmware-defined topologies like
msi-map (DT) or IORT (ACPI), which can provide information about
which MSI controller to use on a per-device basis.
This patch adds the necessary hook into the MSI code to allow this
feature, and provides the msi-map functionnality as a first
implementation.
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add pci_msi_domain_get_msi_rid() to return the MSI requester id (RID).
Initially needed by gic-v3 based systems. It will be used by follow on
patch to drivers/irqchip/irq-gic-v3-its-pci-msi.c
Initially supports mapping the RID via OF device tree. In the future,
this could be extended to use ACPI _IORT tables as well.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add a msi_controller setup_irqs() method so MSI chip providers can
implement their own multivector MSI setup.
[bhelgaas: changelog]
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pratyush Anand <pratyush.anand@gmail.com>
With the msi_list and the msi_domain properties now being at the
generic device level, it is starting to be relatively easy to offer
a generic way of providing non-PCI MSIs.
The two major hurdles with this idea are:
- Lack of global ID that identifies a device: this is worked around by
having a global ID allocator for each device that gets enrolled in
the platform MSI subsystem
- Lack of standard way to write the message in the generating device.
This is solved by mandating driver code to provide a write_msg
callback, so that everyone can have their own square wheel
Apart from that, the API is fairly straightforward:
- platform_msi_create_irq_domain creates an MSI domain that gets
tagged with DOMAIN_BUS_PLATFORM_MSI
- platform_msi_domain_alloc_irqs allocate MSIs for a given device,
populating the msi_list
- platform_msi_domain_free_irqs does what is written on the tin
[ tglx: Created a seperate struct platform_msi_desc and added
kerneldoc entries ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-10-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
mask/unmask_msi_irq and __mask_msi/msix_irq are PCI/MSI specific
functions and should be named accordingly. This is a preparatory patch
to support MSI on non PCI devices.
Rename mask/unmask_msi_irq to pci_msi_mask/unmask_irq and document the
functions. Provide conversion helpers.
Rename __mask_msi/msix_irq to __pci_msi/msix_desc_mask so its clear
that they operated on msi_desc. Fixup the only user outside of
pci/msi.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
"msi_chip" isn't very descriptive, so rename it to "msi_controller". That
tells a little more about what it does and is already used in device tree
bindings.
No functional change.
[bhelgaas: changelog, change *only* the struct name so it's reviewable]
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Now only s390/MSI use default_msi_mask_irq() and default_msix_mask_irq(),
replace them with the common MSI mask IRQ functions __msi_mask_irq() and
__msix_mask_irq(). Remove default_msi_mask_irq() and
default_msix_mask_irq().
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: linux-s390@vger.kernel.org
The problem fixed by 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") has been fixed in a simpler way by a previous commit
("PCI/MSI: Add pci_msi_ignore_mask to prevent writes to MSI/MSI-X Mask
Bits").
The msi_mask_irq() and msix_mask_irq() x86_msi_ops added by 0e4ccb1505
are no longer needed, so revert the commit.
default_msi_mask_irq() and default_msix_mask_irq() were added by
0e4ccb1505 and are still used by s390, so keep them for now.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
MSI-X vector Mask Bits are in MSI-X Tables in PCI memory space. Xen PV
guests can't write to those tables. MSI vector Mask Bits are in PCI
configuration space. Xen PV guests can write to config space, but those
writes are ignored.
Commit 0e4ccb1505 ("PCI: Add x86_msi.msi_mask_irq() and
msix_mask_irq()") added a way to override default_mask_msi_irqs() and
default_mask_msix_irqs() so they can be no-ops in Xen guests, but this is
more complicated than necessary.
Add "pci_msi_ignore_mask" in the core PCI MSI code. If set,
default_mask_msi_irqs() and default_mask_msix_irqs() return without doing
anything. This is less flexible, but much simpler.
[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: xen-devel@lists.xenproject.org
"msi_attrib.pos" is only used for MSI (not MSI-X), and we already cache the
MSI capability offset in "dev->msi_cap".
Remove "pos" from the struct msi_attrib and use "dev->msi_cap" directly.
[bhelgaas: changelog, fix whitespace]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
After commit 1c51b50c29 ("PCI/MSI: Export MSI mode using attributes, not
kobjects"), the kobject in struct msi_desc is unused.
Remove the unused struct kobject from struct msi_desc.
[bhelgaas: changelog]
Fixes: 1c51b50c29 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No architectures implement arch_msi_check_device() or the struct msi_chip
.check_device() method, so remove them.
Remove the "type" parameter to pci_msi_check_device() because it was only
used to call arch_msi_check_device() and is no longer needed.
[bhelgaas: changelog, split to separate patch]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The Multiple Message Capable field in the MSI Message Control register
indicates how many vectors the device supports. This field is read-only,
so cache it in msi_desc to avoid reading it repeatedly.
Since we cache the extracted field (not the entire Message Control
register), we can use msi_mask() instead of msi_capable_mask(), which is
then unused, so remove it.
[bhelgaas: fix whitespace, changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Change x86_msi.restore_msi_irqs(struct pci_dev *dev, int irq) to
x86_msi.restore_msi_irqs(struct pci_dev *dev).
restore_msi_irqs() restores multiple MSI-X IRQs, so param 'int irq' is
unneeded. This makes code more consistent between vm and bare metal.
Dom0 MSI-X restore code can also be optimized as XEN only has a hypercall
to restore all MSI-X vectors at one time.
Tested-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.
Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Certain platforms do not allow writes in the MSI-X BARs to setup or tear
down vector values. To combat against the generic code trying to write to
that and either silently being ignored or crashing due to the pagetables
being marked R/O this patch introduces a platform override.
Note that we keep two separate, non-weak, functions default_mask_msi_irqs()
and default_mask_msix_irqs() for the behavior of the arch_mask_msi_irqs()
and arch_mask_msix_irqs(), as the default behavior is needed by x86 PCI
code.
For Xen, which does not allow the guest to write to MSI-X tables - as the
hypervisor is solely responsible for setting the vector values - we
implement two nops.
This fixes a Xen guest crash when passing a PCI device with MSI-X to the
guest. See the bugzilla for more details.
[bhelgaas: add bugzilla info]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=64581
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
This commit adds a very basic registry of msi_chip structures, so that
an IRQ controller driver can register an msi_chip, and a PCIe host
controller can find it, based on a 'struct device_node'.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
The new struct msi_chip is used to associated an MSI controller with a
PCI bus. It is automatically handed down from the root to its children
during bus enumeration.
This patch provides default (weak) implementations for the architecture-
specific MSI functions (arch_setup_msi_irq(), arch_teardown_msi_irq()
and arch_msi_check_device()) which check if a PCI device's bus has an
attached MSI chip and forward the call appropriately.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Daniel Price <daniel.price@gmail.com>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
Because of the encoding of the "Multiple Message Capable" and "Multiple
Message Enable" fields, a device can only advertise that it's capable of a
power-of-two number of vectors, and the OS can only enable a power-of-two
number.
For example, a device that's limited internally to using 18 vectors would
have to advertise that it's capable of 32. The 14 extra vectors consume
vector numbers and IRQ descriptors even though the device can't actually
use them.
This fix introduces a 'msi_desc::nvec_used' field to address this issue.
When non-zero, it is the actual number of MSIs the device will send, as
requested by the device driver. This value should be used by architectures
to set up and tear down only as many interrupt resources as the device will
actually use.
Note, although the existing 'msi_desc::multiple' field might seem
redundant, in fact it is not. The number of MSIs advertised need not be
the smallest power-of-two larger than the number of MSIs the device will
send. Thus, it is not always possible to derive the former from the
latter, so we need to keep them both to handle this case.
[bhelgaas: changelog, rename to "nvec_used"]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
We had an inconsistent mix of using and omitting the "extern" keyword
on function declarations in header files. This removes them all.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This patch adds a per-pci-device subdirectory in sysfs called:
/sys/bus/pci/devices/<device>/msi_irqs
This sub-directory exports the set of msi vectors allocated by a given
pci device, by creating a numbered sub-directory for each vector beneath
msi_irqs. For each vector various attributes can be exported.
Currently the only attribute is called mode, which tracks the
operational mode of that vector (msi vs. msix)
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.
However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
last MSI message written
- Use the new functions where appropriate
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since most of the callers already know whether they have an MSI or
an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq()
and msix_mask_irq(). The only callers which don't (mask_msi_irq()
and unmask_msi_irq()) can share code in msi_set_mask_bit(). This then
becomes the only caller of msix_flush_writes(), so we can inline it.
The flushing read can be to any address that belongs to the device,
so we can eliminate the calculation too.
We can also get rid of maskbits_mask from struct msi_desc and simply
recalculate it on the rare occasion that we need it. The single-bit
'masked' element is replaced by a copy of the 32-bit 'masked' register,
so this patch does not affect the size of msi_desc.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
MSI interrupts have a mask_pos where MSI-X have a mask_base. Use a
transparent union to get rid of some ugly casts.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
By changing from a 5-bit field to a 1-bit field, we free up some bits
that can be used by a later patch. Also rearrange the fields for better
packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes
to 64 bytes).
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>