Commit Graph

670 Commits

Author SHA1 Message Date
James Smart
21d65b3516 scsi: lpfc: Rework MIB Rx Monitor debug info logic
[ Upstream commit bd269188ea94e40ab002cad7b0df8f12b8f0de54 ]

The kernel test robot reported the following sparse warning:

arch/arm64/include/asm/cmpxchg.h:88:1: sparse: sparse: cast truncates
   bits from constant value (369 becomes 69)

On arm64, atomic_xchg only works on 8-bit byte fields.  Thus, the macro
usage of LPFC_RXMONITOR_TABLE_IN_USE can be unintentionally truncated
leading to all logic involving the LPFC_RXMONITOR_TABLE_IN_USE macro to not
work properly.

Replace the Rx Table atomic_t indexing logic with a new
lpfc_rx_info_monitor structure that holds a circular ring buffer.  For
locking semantics, a spinlock_t is used.

Link: https://lore.kernel.org/r/20220819011736.14141-4-jsmart2021@gmail.com
Fixes: 17b27ac592 ("scsi: lpfc: Add rx monitoring statistics")
Cc: <stable@vger.kernel.org> # v5.15+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10 18:15:24 +01:00
James Smart
d70705e131 scsi: lpfc: Adjust CMF total bytes and rxmonitor
[ Upstream commit a6269f837045acb02904f31f05acde847ec8f8a7 ]

Calculate any extra bytes needed to account for timer accuracy. If we are
less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total
to reflect a full LPFC_CMF_INTERVAL.

Add additional info to rxmonitor, and adjust some log formatting.

Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10 18:15:23 +01:00
James Smart
9ebc6e8ad1 scsi: lpfc: Adjust bytes received vales during cmf timer interval
[ Upstream commit d5ac69b332d8859d1f8bd5d4dee31f3267f6b0d2 ]

The newly added congestion mgmt framework is seeing unexpected congestion
FPINs and signals.  In analysis, time values given to the adapter are not
at hard time intervals. Thus the drift vs the transfer count seen is
affecting how the framework manages things.

Adjust counters to cover the drift.

Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10 18:15:23 +01:00
James Smart
0b0d169723 Revert "scsi: lpfc: SLI path split: Refactor lpfc_iocbq"
This reverts commit 1c5e670d6a.

LTS 5.15 pulled in several lpfc "SLI Path split" patches.  The Path
Split mods were a 14-patch set, which refactors the driver from
to split the sli-3 hw (now eol) from the sli-4 hw and use sli4
structures natively. The patches are highly inter-related.

Given only some of the patches were included, it created a situation
where FLOGI's fail, thus SLI Ports can't start communication.

Reverting this patch as its one of the partial Path Split patches
that was included.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03 23:59:14 +09:00
James Smart
97dc9076ea Revert "scsi: lpfc: Resolve some cleanup issues following SLI path refactoring"
This reverts commit 17bf429b91.

LTS 5.15 pulled in several lpfc "SLI Path split" patches.  The Path
Split mods were a 14-patch set, which refactors the driver from
to split the sli-3 hw (now eol) from the sli-4 hw and use sli4
structures natively. The patches are highly inter-related.

Given only some of the patches were included, it created a situation
where FLOGI's fail, thus SLI Ports can't start communication.

Reverting this patch as its a fix specific to the Path Split patches,
which were partially included and now being pulled from 5.15.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-03 23:59:13 +09:00
Rafael Mendonca
9749595feb scsi: lpfc: Fix memory leak in lpfc_create_port()
[ Upstream commit dc8e483f684a24cc06e1d5fa958b54db58855093 ]

Commit 5e633302ac ("scsi: lpfc: vmid: Add support for VMID in mailbox
command") introduced allocations for the VMID resources in
lpfc_create_port() after the call to scsi_host_alloc(). Upon failure on the
VMID allocations, the new code would branch to the 'out' label, which
returns NULL without unwinding anything, thus skipping the call to
scsi_host_put().

Fix the problem by creating a separate label 'out_free_vmid' to unwind the
VMID resources and make the 'out_put_shost' label call only
scsi_host_put(), as was done before the introduction of allocations for
VMID.

Fixes: 5e633302ac ("scsi: lpfc: vmid: Add support for VMID in mailbox command")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20220916035908.712799-1-rafaelmendsr@gmail.com
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29 10:12:56 +02:00
Yang Yingliang
1dcc308898 scsi: lpfc: Add missing destroy_workqueue() in error path
commit da6d507f5ff328f346b3c50e19e19993027b8ffd upstream.

Add the missing destroy_workqueue() before return from
lpfc_sli4_driver_resource_setup() in the error path.

Link: https://lore.kernel.org/r/20220823044237.285643-1-yangyingliang@huawei.com
Fixes: 3cee98db26 ("scsi: lpfc: Fix crash on driver unload in wq free")
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15 11:30:03 +02:00
James Smart
17bf429b91 scsi: lpfc: Resolve some cleanup issues following SLI path refactoring
commit e27f05147bff21408c1b8410ad8e90cd286e7952 upstream.

Following refactoring and consolidation in SLI processing, fix up some
minor issues related to SLI path:

 - Correct the setting of LPFC_EXCHANGE_BUSY flag in response IOCB.

 - Fix some typographical errors.

 - Fix duplicate log messages.

Link: https://lore.kernel.org/r/20220603174329.63777-4-jsmart2021@gmail.com
Fixes: 1b64aa9eae28 ("scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4")
Cc: <stable@vger.kernel.org> # v5.18
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-17 14:24:31 +02:00
James Smart
1c5e670d6a scsi: lpfc: SLI path split: Refactor lpfc_iocbq
[ Upstream commit a680a9298e7b4ff344aca3456177356b276e5038 ]

Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure.
This is a "common" structure but many of the components refer to sli-rev
specific entities which can lead the developer astray as to what they
actually mean, should be set to, or when they should be used.

This first patch prepares the lpfc_iocbq structure so that elements common
to both SLI3 and SLI4 data paths are more appropriately named, making it
clear they apply generically.

Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually
generic to the paths are renamed to 'cmd':

 - iocb_flag is renamed to cmd_flag

 - lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag

 - fabric_iocb_cmpl is renamed to fabric_cmd_cmpl

 - wait_iocb_cmpl is renamed to wait_cmd_cmpl

 - iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl

 - rsvd2 member is renamed to num_bdes due to pre-existing usage

The structure name itself will retain the iocb reference as changing to a
more relevant "job" or "cmd" title induces many hundreds of line changes
for only a name change.

lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use
in the SLI3 path only.

Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 14:24:21 +02:00
James Smart
eb36ec3039 scsi: lpfc: Fix EEH support for NVMe I/O
[ Upstream commit 25ac2c970be32993f1dff607f8354f3c053d42bc ]

Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.

There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.

Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.

The following modifications are made:

 - A new flag is set in PCI error entrypoints so the driver can track being
   called by that path.

 - An interlock is added in the SLI hw error path and the PCI error path
   such that only one of the paths proceeds with the teardown logic.

 - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
   cmds in cases of hw error.

 - If entering the SLI port re-init calls, a case where SLI error teardown
   was quick and beat the PCI calls now reporting error, check whether the
   SLI port is still live on the PCI bus.

 - In the PCI reset code to bring the adapter back, recheck the IRQ
   settings. Different checks for SLI3 vs SLI4.

 - In I/O completions, that may be called as part of the cleanup or
   underway just before the hw error, check the state of the adapter.  If
   in error, shortcut handling that would expect further adapter
   completions as the hw error won't be sending them.

 - In routines waiting on I/O completions, which may have been in progress
   prior to the hw error, detect the device is being torn down and abort
   from their waits and just give up. This points to a larger issue in the
   driver on ref-counting for data structures, as it doesn't have
   ref-counting on q and port structures. We'll do this fix for now as it
   would be a major rework to be done differently.

 - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
   failed back due to hw error.

 - In I/O buf allocation, done at the start of new I/Os, check hw state and
   fail if hw error.

Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-17 14:24:20 +02:00
James Smart
db6da340d6 scsi: lpfc: Alter FPIN stat accounting logic
[ Upstream commit e6f51041450282a8668af3a8fc5c7744e81a447c ]

When configuring CMF management based on signals instead of FPINs, FPIN
alarm and warning statistics are not tracked.

Change the behavior so that FPIN alarms and warnings are always tracked
regardless of the configured mode.

Similar changes are made in the CMF signal stat accounting logic.  Upon
receipt of a signal, only track signaled alarms and warnings. FPIN stats
should not be incremented upon receipt of a signal.

Link: https://lore.kernel.org/r/20220506035519.50908-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09 10:22:36 +02:00
James Smart
271725e402 scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg()
[ Upstream commit e294647b1aed4247fe52851f3a3b2b19ae906228 ]

In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard
lockup call trace hangs the system.

Call Trace:
 _raw_spin_lock_irqsave+0x32/0x40
 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
 lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
 lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
 lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
 lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
 lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
 lpfc_do_work+0x1485/0x1d70 [lpfc]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x40
Kernel panic - not syncing: Hard LOCKUP

The same CPU tries to claim the phba->port_list_lock twice.

Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and
lpfc_printf_log() macros before calling lpfc_dmp_dbg().  There is no need
to take the phba->port_list_lock within lpfc_dmp_dbg().

Link: https://lore.kernel.org/r/20220412222008.126521-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09 10:22:32 +02:00
James Smart
026083cb43 scsi: lpfc: Fix queue failures when recovering from PCI parity error
[ Upstream commit df0101197c4d9596682901631f3ee193ed354873 ]

When recovering from a pci-parity error the driver is failing to re-create
queues, causing recovery to fail. Looking deeper, it was found that the
interrupt vector count allocated on the recovery was fewer than the vectors
originally allocated. This disparity resulted in CPU map entries with stale
information. When the driver tries to re-create the queues, it attempts to
use the stale information which indicates an eq/interrupt vector that was
no longer created.

Fix by clearng the cpup map array before enabling and requesting the IRQs
in the lpfc_sli_reset_slot_s4 routine().

Link: https://lore.kernel.org/r/20220317032737.45308-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-20 09:34:15 +02:00
James Smart
cd4494f868 scsi: lpfc: Reduce log messages seen after firmware download
commit 5852ed2a6a39c862c8a3fdf646e1f4e01b91d710 upstream.

Messages around firmware download were incorrectly tagged as being related
to discovery trace events. Thus, firmware download status ended up dumping
the trace log as well as the firmware update message. As there were a
couple of log messages in this state, the trace log was dumped multiple
times.

Resolve this by converting from trace events to SLI events.

Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16 12:56:40 +01:00
James Smart
5b91e80b63 scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV
[ Upstream commit f0d3919697492950f57a26a1093aee53880d669d ]

During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries
were still busy. This situation was only seen doing rmmod after at least 1
vport (NPIV) instance was created and destroyed. The number of messages
scaled with the number of vports created.

When a vport is created, it can receive a PLOGI from another initiator
Nport.  When this happens, the driver prepares to ack the PLOGI and
prepares an RPI for registration (via mbx cmd) which includes an mbuf
allocation. During the unsolicited PLOGI processing and after the RPI
preparation, the driver recognizes it is one of the vport instances and
decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI,
the mailbox struct allocated for RPI registration is freed, but the mbuf
that was also allocated is not released.

Fix by freeing the mbuf with the mailbox struct in the LS_RJT path.

As part of the code review to figure the issue out a couple of other areas
where found that also would not have released the mbuf. Those are cleaned
up as well.

Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-27 11:05:00 +01:00
James Smart
bf76f56a7f scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
[ Upstream commit af984c87293b19dccbd0b16afc57c5c9a4a279c7 ]

A link bounce to a slow fabric may observe FDISC response delays lasting
longer than devloss tmo.  Current logic decrements the final fabric node
kref during a devloss tmo event.  This results in a NULL ptr dereference
crash if the FDISC completes for that fabric node after devloss tmo.

Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when
devloss tmo triggers and we've noticed that fabric node recovery has
already started or finished in between the time lpfc_dev_loss_tmo_callbk
queues lpfc_dev_loss_tmo_handler.  If fabric node recovery succeeds, then
the driver reverses the devloss tmo marked kref put with a kref get.  If
fabric node recovery fails, then the final kref put relies on the ELS
timing out or the REG_LOGIN cmpl routine.

Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-25 09:48:29 +01:00
James Smart
37e384095f scsi: lpfc: Fix compilation errors on kernels with no CONFIG_DEBUG_FS
The Kernel test robot flagged the following warning:

  ".../lpfc_init.c:7788:35: error: 'struct lpfc_sli4_hba' has no member
   named 'c_stat'"

Reviewing this issue highlighted that one of the recent patches caused the
driver to no longer compile cleanly if CONFIG_DEBUG_FS is not set.

Correct the different areas that are failing to compile.

Link: https://lore.kernel.org/r/20210908050927.37275-1-jsmart2021@gmail.com
Fixes: 02243836ad ("scsi: lpfc: Add support for the CM framework")
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Build-tested-by: Nathan Chancellor <nathan@kernel.org>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13 22:15:41 -04:00
James Smart
59936430e6 scsi: lpfc: Fix CPU to/from endian warnings introduced by ELS processing
The kernel test robot reported the following sparse warning:
".../lpfc_els.c:3984:25: sparse: sparse: cast from restricted __be16"

For the error being flagged, using be32_to_cpu() on a be16 data type, it
was simple enough. But a review of other elements and warnings were also
evaluated.

This patch corrected several items in the original patch:

 - Using be32_to_cpu() on a be16 data type

 - cpu_to_le32() used on a std uint32_t (CPU) data type.

   Note: This is a byte array, but stored in LE layout by hardware at
   32-bit boundaries. So it possibly needed conversion.

 - Using cpu_to_le32() on a std uint16_t and assigned to a char typeA

 - Using le32_to_cpu() on a le16 type

 - Missing cpu_to_le16() on an assignment

Link: https://lore.kernel.org/r/20210830231243.6227-1-jsmart2021@gmail.com
Fixes: 9064aeb2df ("scsi: lpfc: Add EDC ELS support")
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-09-13 22:15:40 -04:00
Linus Torvalds
a9c9a6f741 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, qla2xxx,
  target, smartpqi, lpfc, mpt3sas).

  The core change causing the most churn was replacing the command
  request field request with a macro, allowing us to offset map to it
  and remove the redundant field; the same was also done for the tag
  field.

  The most impactful change is the final removal of scsi_ioctl, which
  has been deprecated for over a decade"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
  scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
  scsi: ufs: ufs-exynos: Fix static checker warning
  scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Copyright updates for 14.0.0.1 patches
  scsi: lpfc: Update lpfc version to 14.0.0.1
  scsi: lpfc: Add bsg support for retrieving adapter cmf data
  scsi: lpfc: Add cmf_info sysfs entry
  scsi: lpfc: Add debugfs support for cm framework buffers
  scsi: lpfc: Add support for maintaining the cm statistics buffer
  scsi: lpfc: Add rx monitoring statistics
  scsi: lpfc: Add support for the CM framework
  scsi: lpfc: Add cmfsync WQE support
  scsi: lpfc: Add support for cm enablement buffer
  scsi: lpfc: Add cm statistics buffer support
  scsi: lpfc: Add EDC ELS support
  scsi: lpfc: Expand FPIN and RDF receive logging
  scsi: lpfc: Add MIB feature enablement support
  scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
  scsi: fc: Add EDC ELS definition
  ...
2021-09-02 15:09:46 -07:00
James Smart
74a7baa2a3 scsi: lpfc: Add cmf_info sysfs entry
Allow abbreviated cm framework status information to be obtained via sysfs.

Link: https://lore.kernel.org/r/20210816162901.121235-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:34 -04:00
James Smart
7481811c3a scsi: lpfc: Add support for maintaining the cm statistics buffer
Add the logic to move the congestion management and event information into
the cmd statistics buffer maintained for the adapter.  The update includes
rolling up values for the last minute, hour, and day information.

Link: https://lore.kernel.org/r/20210816162901.121235-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:34 -04:00
James Smart
17b27ac592 scsi: lpfc: Add rx monitoring statistics
The driver provides overwatch of the cm behavior by maintaining a set of rx
I/O statistics. This information is also used in later updating of the cm
statistics buffer.

Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:34 -04:00
James Smart
02243836ad scsi: lpfc: Add support for the CM framework
Complete the enablement of the cm framework feature in the adapter. Perform
the following:

 - Detect the presence of the congestion management framework feature.

When the cm framework is present:

 - Issue the SET_FEATURE command to enable the feature.

 - Register the cm statistics buffer with the adapter.

 - Read the cm enablement buffer to determine the cm framework state for cm
   management.

When cm management is enabled:

 - Monitor all FPIN and congestion signalling events, incrementing
   counters.

 - Regularly sync with the adapter to communicate congestion events and to
   receive an rx request limit.

 - Monitor requests for rx data and ensure that no more than the
   adapter prescribed limit is issued on the link. If the limit is
   exceeded, SCSI and/or NVMe traffic is temporarily suspended.

 - Maintain the minute, hourly, daily statistics buffer.

 - Monitor for congestion enablement change events, causing a reread of the
   enablement buffer and acting on any change in enablement.

And:

 - Add teardown logic, including buffer deregistration, on adapter
   detachment or reset.

Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:34 -04:00
James Smart
72df8a4528 scsi: lpfc: Add support for cm enablement buffer
As part of the cmf framework, the firmware maintains a table with
congestion related state information, specifically whether enabled and if
enabled, whether monitoring or actively managing congestion.

Add definition of the table and add support to read the table from the
adapter and determine if it is enabled. In support of this, the READ_OBJECT
mailbox command definition is added to the driver.

Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:33 -04:00
James Smart
8c42a65c39 scsi: lpfc: Add cm statistics buffer support
The cmf framework requires the driver to maintain a cm statistics table,
accessible inband, of congestion related statistics that are reported per
minute, rolled up to per hour, and rolled up again per day. Several days
worth may be maintained.  The table is registered with the adapter when the
MIB feature is enabled.

Add definition of the table and add support to register the table with the
adapter. Includes definition and initialization of event counters that are
later added to the statistics table.

Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:33 -04:00
James Smart
9064aeb2df scsi: lpfc: Add EDC ELS support
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.

Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.

Implement handlers for congestion signals from the fabric and maintain
statistics for them.

Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:33 -04:00
James Smart
c6a5c747a3 scsi: lpfc: Add MIB feature enablement support
MIB support is currently limited to detecting support in the adapter and
ensuring FDMI support is enabled if present.  For the new framework MIB
support also requires active enablement of support via the SET_FEATURES
command with the firmware.

Rework the MIB detection and enablement for the following:

 - Move detection away from the get_sli4_parameters routine, and into the
   hba_setup path. get_sli4_parameters is only called once at attachment
   while hba_setup is called as part of any SLI port reset path. This
   ensures detection after firmware download.

 - Update SET_FEATURES mbx command for the MIB enablement feature and add
   support for the feature.

 - Create the cmf_setup routine to encapsulate the detection of MIB support
   and perform the enablement of the MIB support feature.

Link: https://lore.kernel.org/r/20210816162901.121235-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:33 -04:00
Ewan D. Milne
9977d880f7 scsi: lpfc: Move initialization of phba->poll_list earlier to avoid crash
The phba->poll_list is traversed in case of an error in
lpfc_sli4_hba_setup(), so it must be initialized earlier in case the error
path is taken.

[  490.030738] lpfc 0000:65:00.0: 0:1413 Failed to init iocb list.
[  490.036661] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  490.044485] PGD 0 P4D 0
[  490.047027] Oops: 0000 [#1] SMP PTI
[  490.050518] CPU: 0 PID: 7 Comm: kworker/0:1 Kdump: loaded Tainted: G          I      --------- -  - 4.18.
[  490.060511] Hardware name: Dell Inc. PowerEdge R440/0WKGTH, BIOS 1.4.8 05/22/2018
[  490.067994] Workqueue: events work_for_cpu_fn
[  490.072371] RIP: 0010:lpfc_sli4_cleanup_poll_list+0x20/0xb0 [lpfc]
[  490.078546] Code: cf e9 04 f7 fe ff 0f 1f 40 00 0f 1f 44 00 00 41 57 49 89 ff 41 56 41 55 41 54 4d 8d a79
[  490.097291] RSP: 0018:ffffbd1a463dbcc8 EFLAGS: 00010246
[  490.102518] RAX: 0000000000008200 RBX: ffff945cdb8c0000 RCX: 0000000000000000
[  490.109649] RDX: 0000000000018200 RSI: ffff9468d0e16818 RDI: 0000000000000000
[  490.116783] RBP: ffff945cdb8c1740 R08: 00000000000015c5 R09: 0000000000000042
[  490.123915] R10: 0000000000000000 R11: ffffbd1a463dbab0 R12: ffff945cdb8c25c0
[  490.131049] R13: 00000000fffffff4 R14: 0000000000001800 R15: ffff945cdb8c0000
[  490.138182] FS:  0000000000000000(0000) GS:ffff9468d0e00000(0000) knlGS:0000000000000000
[  490.146267] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  490.152013] CR2: 0000000000000000 CR3: 000000042ca10002 CR4: 00000000007706f0
[  490.159146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  490.166277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  490.173409] PKRU: 55555554
[  490.176123] Call Trace:
[  490.178598]  lpfc_sli4_queue_destroy+0x7f/0x3c0 [lpfc]
[  490.183745]  lpfc_sli4_hba_setup+0x1bc7/0x23e0 [lpfc]
[  490.188797]  ? kernfs_activate+0x63/0x80
[  490.192721]  ? kernfs_add_one+0xe7/0x130
[  490.196647]  ? __kernfs_create_file+0x80/0xb0
[  490.201020]  ? lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc]
[  490.206944]  lpfc_pci_probe_one_s4.isra.48+0x46f/0x9e0 [lpfc]
[  490.212697]  lpfc_pci_probe_one+0x179/0xb70 [lpfc]
[  490.217492]  local_pci_probe+0x41/0x90
[  490.221246]  work_for_cpu_fn+0x16/0x20
[  490.224994]  process_one_work+0x1a7/0x360
[  490.229009]  ? create_worker+0x1a0/0x1a0
[  490.232933]  worker_thread+0x1cf/0x390
[  490.236687]  ? create_worker+0x1a0/0x1a0
[  490.240612]  kthread+0x116/0x130
[  490.243846]  ? kthread_flush_work_fn+0x10/0x10
[  490.248293]  ret_from_fork+0x35/0x40
[  490.251869] Modules linked in: lpfc(+) xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4i
[  490.332609] CR2: 0000000000000000

Link: https://lore.kernel.org/r/20210809150947.18104-1-emilne@redhat.com
Fixes: 93a4d6f401 ("scsi: lpfc: Add registration for CPU Offline/Online events")
Cc: stable@vger.kernel.org
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-09 22:45:51 -04:00
James Smart
bfc477854a scsi: lpfc: Add 256 Gb link speed support
Update routines to support 256 Gb link speed for LPe37000/LPe38000
adapters. 256 Gb speeds can be seen on trunk links.

Link: https://lore.kernel.org/r/20210722221721.74388-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-27 00:06:41 -04:00
James Smart
f6c5e6c456 scsi: lpfc: Revise Topology and RAS support checks for new adapters
Support for Topology and RAS logging capabilities were qualified by PCIe
device ID checks necessitating additional driver changes for new device
IDs.

Reduce reliance on specific PCIe device IDs by substituting checks for SLI
family information. This automatically picks up support on the newest
hardware.

Link: https://lore.kernel.org/r/20210722221721.74388-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-27 00:06:41 -04:00
James Smart
f449a3d7a1 scsi: lpfc: Add PCI ID support for LPe37000/LPe38000 series adapters
Update supported pci_device_id table to include the values for the G7+ ASIC
Device ID utilized by LPe37xxx and LPe38xxx series of adapters.  The
default reporting string will be "LPe38000".

Link: https://lore.kernel.org/r/20210722221721.74388-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-27 00:06:40 -04:00
James Smart
137ddf0384 scsi: lpfc: Use PBDE feature enabled bit to determine PBDE support
The SLI4 interface changed the manner used to indicate PBDE support.
Rework the driver to check for PBDE support via the PBDE feature bit in
COMMON_GET_SLI4_PARAMETERS.

Link: https://lore.kernel.org/r/20210707184351.67872-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:36 -04:00
James Smart
a9978e3978 scsi: lpfc: Clear outstanding active mailbox during PCI function reset
Mailbox commands sent via ioctl/bsg from user applications may be
interrupted from processing by a concurrently triggered PCI function
reset. The command will not generate a completion due to the reset.  This
results in a user application hang waiting for the mailbox command to
complete.

Resolve by changing the function reset handler to detect that there was an
outstanding mailbox command and simulate a mailbox completion.  Add some
additional debug when a mailbox command times out.

Link: https://lore.kernel.org/r/20210707184351.67872-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:36 -04:00
James Smart
affbe24429 scsi: lpfc: Fix KASAN slab-out-of-bounds in lpfc_unreg_rpi() routine
In lpfc_offline_prep() an RPI is freed and nlp_rpi set to 0xFFFF before
calling lpfc_unreg_rpi().  Unfortunately, lpfc_unreg_rpi() uses nlp_rpi to
index the sli4_hba.rpi_ids[] array.

In lpfc_offline_prep(), unreg rpi before freeing the rpi.

Link: https://lore.kernel.org/r/20210707184351.67872-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:36 -04:00
James Smart
50baa1595d scsi: lpfc: Fix function description comments for vmid routines
Update comment headers for functions lpfc_vmid_cmd and lpfc_vmid_poll.

Link: https://lore.kernel.org/r/20210707184351.67872-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:35 -04:00
James Smart
e861308405 scsi: lpfc: Remove use of kmalloc() in trace event logging
There are instances when trace event logs are triggered from an interrupt
context. The trace event log may attempt to alloc memory causing scheduling
while atomic bug call traces.

Remove the need for the kmalloc'ed vport array when checking the
log_verbose flag, which eliminates the need for any allocation.

Link: https://lore.kernel.org/r/20210707184351.67872-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:34 -04:00
James Smart
ae463b6023 scsi: lpfc: Fix NVMe support reporting in log message
The NVMe support indicator in log message 6422 is displaying a field that
was initialized but never set to indicate NVMe support.  Remove obsolete
nvme_support element from the lpfc_hba structure and change log message to
display NVMe support status as reported in SLI4 Config Parameters mailbox
command.

Link: https://lore.kernel.org/r/20210707184351.67872-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:34 -04:00
Linus Torvalds
f5c13f1fde Merge tag 'driver-core-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core changes from Greg KH:
 "Here is the small set of driver core and debugfs updates for 5.14-rc1.

  Included in here are:

   - debugfs api cleanups (touched some drivers)

   - devres updates

   - tiny driver core updates and tweaks

  Nothing major in here at all, and all have been in linux-next for a
  while with no reported issues"

* tag 'driver-core-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (27 commits)
  docs: ABI: testing: sysfs-firmware-memmap: add some memmap types.
  devres: Enable trace events
  devres: No need to call remove_nodes() when there none present
  devres: Use list_for_each_safe_from() in remove_nodes()
  devres: Make locking straight forward in release_nodes()
  kernfs: move revalidate to be near lookup
  drivers/base: Constify static attribute_group structs
  firmware_loader: remove unneeded 'comma' macro
  devcoredump: remove contact information
  driver core: Drop helper devm_platform_ioremap_resource_wc()
  component: Rename 'dev' to 'parent'
  component: Drop 'dev' argument to component_match_realloc()
  device property: Don't check for NULL twice in the loops
  driver core: auxiliary bus: Fix typo in the docs
  drivers/base/node.c: make CACHE_ATTR define static DEVICE_ATTR_RO
  debugfs: remove return value of debugfs_create_ulong()
  debugfs: remove return value of debugfs_create_bool()
  scsi: snic: debugfs: remove local storage of debugfs files
  b43: don't save dentries for debugfs
  b43legacy: don't save dentries for debugfs
  ...
2021-07-05 13:51:41 -07:00
Gaurav Srivastava
20397179aa scsi: lpfc: vmid: Timeout implementation for VMID
Implement timeout functionality for the VMID. After the set time period of
inactivity, the VMID is deregistered from the switch.

Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10 10:01:33 -04:00
Gaurav Srivastava
5e633302ac scsi: lpfc: vmid: Add support for VMID in mailbox command
Add supporting datastructures for mailbox command which helps in
determining if the firmware supports appid. Allocate resources for VMID at
initialization time and clean them up on removal.

Link: https://lore.kernel.org/r/20210608043556.274139-7-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10 10:01:32 -04:00
James Smart
01131e7aae scsi: lpfc: Fix unreleased RPIs when NPIV ports are created
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.

Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.

To resolve the issue, do the following:

 - Restore the RPI release code, but move the location to so that it is in
   line with the new node cleanup design.

 - NPIV ports now release the RPI and drop the node when the caller sets
   the NLP_RELEASE_RPI flag.

 - Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
   release of RPI to free pool.

 - Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
   completed.

 - Stop offline_prep from skipping nodes that are UNUSED. The RPI may
   not have been released.

 - Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.

 - Fixed up debugfs RPI displays for better debugging.

Fixes: a70e63eee1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21 23:23:27 -04:00
Shawn Guo
0733d83905 firmware: replace HOTPLUG with UEVENT in FW_ACTION defines
With commit 312c004d36 ("[PATCH] driver core: replace "hotplug" by
"uevent"") already in the tree over a decade, update the name of
FW_ACTION defines to follow semantics, and reflect what the defines are
really meant for, i.e. whether or not generate user space event.

Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210425020024.28057-1-shawn.guo@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-13 16:14:45 +02:00
James Smart
e4ec10228f scsi: lpfc: Fix bad memory access during VPD DUMP mailbox command
The dump command for reading a region passes a requested read length
specified in words (4-byte units). The response overwrites the same field
with the actual number of bytes read.

The mailbox handler for DUMP which reads VPD data (region 23) is treating
the response field as if it were still a word_cnt, thus multiplying it by 4
to set the read's "length". Given the read value was calculated based on
the size of the read buffer, the longer response length runs off the end of
the buffer.

Fix by reworking the code to use the response field as a byte count.

Link: https://lore.kernel.org/r/20210421234511.102206-1-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-26 22:58:38 -04:00
James Smart
f115612528 scsi: lpfc: Standardize discovery object logging format
Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.

Standardize the pointer format to x%px.

Note: %px use is limited to discovery objects in order to aid core
analysis.

Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:14 -04:00
James Smart
b62232ba8c scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic
SLI-4 does not contain a PORT_CAPABILITIES mailbox command (only SLI-3
does, and SLI-3 doesn't use it), yet there are SLI-4 code paths that have
code to issue the command.  The command will always fail.

Remove the code for the mailbox command and leave only the resulting
"failure path" logic.

Link: https://lore.kernel.org/r/20210412013127.2387-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:14 -04:00
James Smart
304ee43238 scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL mode
In SLI-4, when performing a mailbox command with MBX_POLL, the driver uses
the BMBX register to send the command rather than the MQ. A flag is set
indicating the BMBX register is active and saves the mailbox job struct
(mboxq) in the mbox_active element of the adapter. The routine then waits
for completion or timeout. The mailbox job struct is not freed by the
routine. In cases of timeout, the adapter will be reset. The
lpfc_sli_mbox_sys_flush() routine will clean up the mbox in preparation for
the reset. It clears the BMBX active flag and marks the job structure as
MBX_NOT_FINISHED. But, it never frees the mboxq job structure. Expectation
in both normal completion and timeout cases is that the issuer of the mbx
command will free the structure.  Unfortunately, not all calling paths are
freeing the memory in cases of error.

All calling paths were looked at and updated, if missing, to free the mboxq
memory regardless of completion status.

Link: https://lore.kernel.org/r/20210412013127.2387-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:13 -04:00
James Smart
a789241e49 scsi: lpfc: Fix NMI crash during rmmod due to circular hbalock dependency
Remove hbalock dependency for lpfc_abts_els_sgl_list and
lpfc_abts_nvmet_ctx_list.  The lists are adaquately synchronized with the
sgl_list_lock and abts_nvmet_buf_list_lock.

Link: https://lore.kernel.org/r/20210412013127.2387-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:13 -04:00
James Smart
67073c69c8 scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changes
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.

Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:06 -05:00
James Smart
0b3ad32e26 scsi: lpfc: Enhancements to LOG_TRACE_EVENT for better readability
While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.

1) in the following msg:
      kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
   If cnt is zero in the 1st message, there is no reason to display the
   1st message, which is just giving start/end positioning.

   Fix by not displaying message if cnt is 0.

2) If the driver is loaded with module log verbosity off, and later a
   single NPIV host instance verbosity is enabled via sysfs, it enables
   messages on all instances. This is due to the trace log verbosity checks
   (lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
   the vport.

   Fix by enabling a check on both phba and vport.

3) in the following messages:
       2904 Firmware Dump Image Present on Adapter
       2887 Reset Needed: Attempting Port Recovery...
   These messages are not necessary for the trace event log, which is
   primarily for discovery.

   Fix by changing log level on these 2 messages to LOG_SLI.

Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
a22d73b655 scsi: lpfc: Implement health checking when aborting I/O
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware.  If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00