msm-5.15

Author	SHA1	Message	Date
Justin Tee	2cf66428a2	scsi: lpfc: Fix hard lockup when reading the rx_monitor from debugfs [ Upstream commit c44e50f4a0ec00c2298f31f91bc2c3e9bbd81c7e ] During I/O and simultaneous cat of /sys/kernel/debug/lpfc/fnX/rx_monitor, a hard lockup similar to the call trace below may occur. The spin_lock_bh in lpfc_rx_monitor_report is not protecting from timer interrupts as expected, so change the strength of the spin lock to _irq. Kernel panic - not syncing: Hard LOCKUP CPU: 3 PID: 110402 Comm: cat Kdump: loaded exception RIP: native_queued_spin_lock_slowpath+91 [IRQ stack] native_queued_spin_lock_slowpath at ffffffffb814e30b _raw_spin_lock at ffffffffb89a667a lpfc_rx_monitor_record at ffffffffc0a73a36 [lpfc] lpfc_cmf_timer at ffffffffc0abbc67 [lpfc] __hrtimer_run_queues at ffffffffb8184250 hrtimer_interrupt at ffffffffb8184ab0 smp_apic_timer_interrupt at ffffffffb8a026ba apic_timer_interrupt at ffffffffb8a01c4f [End of IRQ stack] apic_timer_interrupt at ffffffffb8a01c4f lpfc_rx_monitor_report at ffffffffc0a73c80 [lpfc] lpfc_rx_monitor_read at ffffffffc0addde1 [lpfc] full_proxy_read at ffffffffb83e7fc3 vfs_read at ffffffffb833fe71 ksys_read at ffffffffb83402af do_syscall_64 at ffffffffb800430b entry_SYSCALL_64_after_hwframe at ffffffffb8a000ad Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20221017164323.14536-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-12-31 13:14:43 +01:00
James Smart	21d65b3516	scsi: lpfc: Rework MIB Rx Monitor debug info logic [ Upstream commit bd269188ea94e40ab002cad7b0df8f12b8f0de54 ] The kernel test robot reported the following sparse warning: arch/arm64/include/asm/cmpxchg.h:88:1: sparse: sparse: cast truncates bits from constant value (369 becomes 69) On arm64, atomic_xchg only works on 8-bit byte fields. Thus, the macro usage of LPFC_RXMONITOR_TABLE_IN_USE can be unintentionally truncated leading to all logic involving the LPFC_RXMONITOR_TABLE_IN_USE macro to not work properly. Replace the Rx Table atomic_t indexing logic with a new lpfc_rx_info_monitor structure that holds a circular ring buffer. For locking semantics, a spinlock_t is used. Link: https://lore.kernel.org/r/20220819011736.14141-4-jsmart2021@gmail.com Fixes: `17b27ac592` ("scsi: lpfc: Add rx monitoring statistics") Cc: <stable@vger.kernel.org> # v5.15+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-10 18:15:24 +01:00
James Smart	d70705e131	scsi: lpfc: Adjust CMF total bytes and rxmonitor [ Upstream commit a6269f837045acb02904f31f05acde847ec8f8a7 ] Calculate any extra bytes needed to account for timer accuracy. If we are less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total to reflect a full LPFC_CMF_INTERVAL. Add additional info to rxmonitor, and adjust some log formatting. Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-10 18:15:23 +01:00
James Smart	9ebc6e8ad1	scsi: lpfc: Adjust bytes received vales during cmf timer interval [ Upstream commit d5ac69b332d8859d1f8bd5d4dee31f3267f6b0d2 ] The newly added congestion mgmt framework is seeing unexpected congestion FPINs and signals. In analysis, time values given to the adapter are not at hard time intervals. Thus the drift vs the transfer count seen is affecting how the framework manages things. Adjust counters to cover the drift. Link: https://lore.kernel.org/r/20210910233159.115896-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: bd269188ea94 ("scsi: lpfc: Rework MIB Rx Monitor debug info logic") Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-11-10 18:15:23 +01:00
James Smart	0b0d169723	Revert "scsi: lpfc: SLI path split: Refactor lpfc_iocbq" This reverts commit `1c5e670d6a`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its one of the partial Path Split patches that was included. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:14 +09:00
James Smart	7a0fce24de	Revert "scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4" This reverts commit `c56cc7fefc`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its one of the partial Path Split patches that was included. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:14 +09:00
James Smart	7a36c9de43	Revert "scsi: lpfc: SLI path split: Refactor SCSI paths" This reverts commit `b4543dbea8`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its one of the partial Path Split patches that was included. NOTE: fixed a git revert error which caused a new line to be inserted: line 5755 of lpfc_scsi.c in lpfc_queuecommand + atomic_inc(&ndlp->cmd_pending); Removed the line Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:14 +09:00
James Smart	eb8be2dbfb	Revert "scsi: lpfc: Fix locking for lpfc_sli_iocbq_lookup()" This reverts commit `9a570069cd`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its a fix specific to the Path Split patches, which were partially included and now being pulled from 5.15. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:14 +09:00
James Smart	065bf71a8a	Revert "scsi: lpfc: Fix element offset in __lpfc_sli_release_iocbq_s4()" This reverts commit `6e99860de6`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its a fix specific to the Path Split patches, which were partially included and now being pulled from 5.15. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:14 +09:00
James Smart	97dc9076ea	Revert "scsi: lpfc: Resolve some cleanup issues following SLI path refactoring" This reverts commit `17bf429b91`. LTS 5.15 pulled in several lpfc "SLI Path split" patches. The Path Split mods were a 14-patch set, which refactors the driver from to split the sli-3 hw (now eol) from the sli-4 hw and use sli4 structures natively. The patches are highly inter-related. Given only some of the patches were included, it created a situation where FLOGI's fail, thus SLI Ports can't start communication. Reverting this patch as its a fix specific to the Path Split patches, which were partially included and now being pulled from 5.15. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-11-03 23:59:13 +09:00
Rafael Mendonca	9749595feb	scsi: lpfc: Fix memory leak in lpfc_create_port() [ Upstream commit dc8e483f684a24cc06e1d5fa958b54db58855093 ] Commit `5e633302ac` ("scsi: lpfc: vmid: Add support for VMID in mailbox command") introduced allocations for the VMID resources in lpfc_create_port() after the call to scsi_host_alloc(). Upon failure on the VMID allocations, the new code would branch to the 'out' label, which returns NULL without unwinding anything, thus skipping the call to scsi_host_put(). Fix the problem by creating a separate label 'out_free_vmid' to unwind the VMID resources and make the 'out_put_shost' label call only scsi_host_put(), as was done before the introduction of allocations for VMID. Fixes: `5e633302ac` ("scsi: lpfc: vmid: Add support for VMID in mailbox command") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220916035908.712799-1-rafaelmendsr@gmail.com Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-10-29 10:12:56 +02:00
Hannes Reinecke	87cd4c02bd	scsi: lpfc: Return DID_TRANSPORT_DISRUPTED instead of DID_REQUEUE [ Upstream commit c0a50cd389c3ed54831e240023dd12bafa56b3a6 ] When the driver hits an internal error condition returning DID_REQUEUE the I/O will be retried on the same ITL nexus. This will inhibit multipathing, resulting in endless retries even if the error could have been resolved by using a different ITL nexus. Return DID_TRANSPORT_DISRUPTED to allow for multipath to engage and route I/O to another ITL nexus. Link: https://lore.kernel.org/r/20220824060033.138661-1-hare@suse.de Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-09-23 14:15:50 +02:00
Yang Yingliang	1dcc308898	scsi: lpfc: Add missing destroy_workqueue() in error path commit da6d507f5ff328f346b3c50e19e19993027b8ffd upstream. Add the missing destroy_workqueue() before return from lpfc_sli4_driver_resource_setup() in the error path. Link: https://lore.kernel.org/r/20220823044237.285643-1-yangyingliang@huawei.com Fixes: `3cee98db26` ("scsi: lpfc: Fix crash on driver unload in wq free") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-09-15 11:30:03 +02:00
James Smart	9c8e2e6072	scsi: lpfc: Fix possible memory leak when failing to issue CMF WQE [ Upstream commit 2f67dc7970bce3529edce93a0a14234d88b3fcd5 ] There is no corresponding free routine if lpfc_sli4_issue_wqe fails to issue the CMF WQE in lpfc_issue_cmf_sync_wqe. If ret_val is non-zero, then free the iocbq request structure. Link: https://lore.kernel.org/r/20220701211425.2708-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-25 11:40:35 +02:00
James Smart	b92506dc51	scsi: lpfc: Prevent buffer overflow crashes in debugfs with malformed user input [ Upstream commit f8191d40aa612981ce897e66cda6a88db8df17bb ] Malformed user input to debugfs results in buffer overflow crashes. Adapt input string lengths to fit within internal buffers, leaving space for NULL terminators. Link: https://lore.kernel.org/r/20220701211425.2708-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-25 11:40:34 +02:00
James Smart	17bf429b91	scsi: lpfc: Resolve some cleanup issues following SLI path refactoring commit e27f05147bff21408c1b8410ad8e90cd286e7952 upstream. Following refactoring and consolidation in SLI processing, fix up some minor issues related to SLI path: - Correct the setting of LPFC_EXCHANGE_BUSY flag in response IOCB. - Fix some typographical errors. - Fix duplicate log messages. Link: https://lore.kernel.org/r/20220603174329.63777-4-jsmart2021@gmail.com Fixes: 1b64aa9eae28 ("scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4") Cc: <stable@vger.kernel.org> # v5.18 Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:31 +02:00
James Smart	6e99860de6	scsi: lpfc: Fix element offset in __lpfc_sli_release_iocbq_s4() commit 84c6f99e39074d45f75986e42ca28e27c140fd0d upstream. The prior commit that moved from iocb elements to explicit wqe elements missed a name change. Correct __lpfc_sli_release_iocbq_s4() to reference wqe rather than iocb. Link: https://lore.kernel.org/r/20220506035519.50908-2-jsmart2021@gmail.com Fixes: a680a9298e7b ("scsi: lpfc: SLI path split: Refactor lpfc_iocbq") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:31 +02:00
James Smart	9a570069cd	scsi: lpfc: Fix locking for lpfc_sli_iocbq_lookup() commit c26bd6602e1d348bfa754dc55e5608c922dd2801 upstream. The rules changed for lpfc_sli_iocbq_lookup() vs locking. Prior, the routine properly took out the lock. In newly refactored code, the locks must be held when calling the routine. Fix lpfc_sli_process_sol_iocb() to take the locks before calling the routine. Fix lpfc_sli_handle_fast_ring_event() to not release the locks to call the routine. Link: https://lore.kernel.org/r/20220323205545.81814-3-jsmart2021@gmail.com Fixes: 1b64aa9eae28 ("scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4") Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-08-17 14:24:31 +02:00
James Smart	2b5ef6430c	scsi: lpfc: Remove extra atomic_inc on cmd_pending in queuecommand after VMID [ Upstream commit 0948a9c5386095baae4012190a6b65aba684a907 ] VMID introduced an extra increment of cmd_pending, causing double-counting of the I/O. The normal increment ios performed in lpfc_get_scsi_buf. Link: https://lore.kernel.org/r/20220701211425.2708-5-jsmart2021@gmail.com Fixes: `33c79741de` ("scsi: lpfc: vmid: Introduce VMID in I/O path") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:21 +02:00
James Smart	b4543dbea8	scsi: lpfc: SLI path split: Refactor SCSI paths [ Upstream commit 3512ac0942938d6977e7999ee69765d948d2faf1 ] This patch refactors the SCSI paths to use SLI-4 as the primary interface. - Conversion away from using SLI-3 iocb structures to set/access fields in common routines. Use the new generic get/set routines that were added. This move changes code from indirect structure references to using local variables with the generic routines. - Refactor routines when setting non-generic fields, to have both SLI3 and SLI4 specific sections. This replaces the set-as-SLI3 then translate to SLI4 behavior of the past. Link: https://lore.kernel.org/r/20220225022308.16486-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:21 +02:00
James Smart	c56cc7fefc	scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 [ Upstream commit 1b64aa9eae28ac598a03ed3d62a63ac5e5b295fc ] Convert the SLI4 fast and slow paths to use native SLI4 wqe constructs instead of iocb SLI3-isms. Includes the following: - Create simple get_xxx and set_xxx routines to wrapper access to common elements in both SLI3 and SLI4 commands - allowing calling routines to avoid sli-rev-specific structures to access the elements. - using the wqe in the job structure as the primary element - use defines from SLI-4, not SLI-3 - Removal of iocb to wqe conversion from fast and slow path - Add below routines to handle fast path lpfc_prep_embed_io - prepares the wqe for fast path lpfc_wqe_bpl2sgl - manages bpl to sgl conversion lpfc_sli_wqe2iocb - converts a WQE to IOCB for SLI-3 path - Add lpfc_sli3_iocb2wcqecmpl in completion path to convert an SLI-3 iocb completion to wcqe completion - Refactor some of the code that works on both revs for clarity Link: https://lore.kernel.org/r/20220225022308.16486-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:21 +02:00
James Smart	1c5e670d6a	scsi: lpfc: SLI path split: Refactor lpfc_iocbq [ Upstream commit a680a9298e7b4ff344aca3456177356b276e5038 ] Currently, SLI3 and SLI4 data paths use the same lpfc_iocbq structure. This is a "common" structure but many of the components refer to sli-rev specific entities which can lead the developer astray as to what they actually mean, should be set to, or when they should be used. This first patch prepares the lpfc_iocbq structure so that elements common to both SLI3 and SLI4 data paths are more appropriately named, making it clear they apply generically. Fieldnames based on 'iocb' (sli3) or 'wqe' (sli4) which are actually generic to the paths are renamed to 'cmd': - iocb_flag is renamed to cmd_flag - lpfc_vmid_iocb_tag is renamed to lpfc_vmid_tag - fabric_iocb_cmpl is renamed to fabric_cmd_cmpl - wait_iocb_cmpl is renamed to wait_cmd_cmpl - iocb_cmpl and wqe_cmpl are combined and renamed to cmd_cmpl - rsvd2 member is renamed to num_bdes due to pre-existing usage The structure name itself will retain the iocb reference as changing to a more relevant "job" or "cmd" title induces many hundreds of line changes for only a name change. lpfc_post_buffer is also renamed to lpfc_sli3_post_buffer to indicate use in the SLI3 path only. Link: https://lore.kernel.org/r/20220225022308.16486-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:21 +02:00
James Smart	eb36ec3039	scsi: lpfc: Fix EEH support for NVMe I/O [ Upstream commit 25ac2c970be32993f1dff607f8354f3c053d42bc ] Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-08-17 14:24:20 +02:00
James Smart	4b5020fc23	scsi: lpfc: Allow reduced polling rate for nvme_admin_async_event cmd completion [ Upstream commit 2e7e9c0c1ec05f18d320ecc8a31eec59d2af1af9 ] NVMe Asynchronous Event Request commands have no command timeout value per specifications. Set WQE option to allow a reduced FLUSH polling rate for I/O error detection specifically for nvme_admin_async_event commands. Link: https://lore.kernel.org/r/20220603174329.63777-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-22 14:21:57 +02:00
James Smart	6782a2ccd5	scsi: lpfc: Fix port stuck in bypassed state after LIP in PT2PT topology [ Upstream commit 336d63615466b4c06b9401c987813fd19bdde39b ] After issuing a LIP, a specific target vendor does not ACC the FLOGI that lpfc sends. However, it does send its own FLOGI that lpfc ACCs. The target then establishes the port IDs by sending a PLOGI. lpfc PLOGI_ACCs and starts the RPI registration for DID 0x000001. The target then sends a LOGO to the fabric DID. lpfc is currently treating the LOGO from the fabric DID as a link down and cleans up all the ndlps. The ndlp for DID 0x000001 is put back into NPR and discovery stops, leaving the port in stuck in bypassed mode. Change lpfc behavior such that if a LOGO is received for the fabric DID in PT2PT topology skip the lpfc_linkdown_port() routine and just move the fabric DID back to NPR. Link: https://lore.kernel.org/r/20220603174329.63777-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-22 14:21:57 +02:00
James Smart	5e83869e29	scsi: lpfc: Resolve NULL ptr dereference after an ELS LOGO is aborted [ Upstream commit b1b3440f437b75fb2a9b0cfe58df461e40eca474 ] A use-after-free crash can occur after an ELS LOGO is aborted. Specifically, a nodelist structure is freed and then ndlp->vport->cfg_log_verbose is dereferenced in lpfc_nlp_get() when the discovery state machine is mistakenly called a second time with NLP_EVT_DEVICE_RM argument. Rework lpfc_cmpl_els_logo() to prevent the duplicate calls to release a nodelist structure. Link: https://lore.kernel.org/r/20220603174329.63777-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-22 14:21:57 +02:00
James Smart	db6da340d6	scsi: lpfc: Alter FPIN stat accounting logic [ Upstream commit e6f51041450282a8668af3a8fc5c7744e81a447c ] When configuring CMF management based on signals instead of FPINs, FPIN alarm and warning statistics are not tracked. Change the behavior so that FPIN alarms and warnings are always tracked regardless of the configured mode. Similar changes are made in the CMF signal stat accounting logic. Upon receipt of a signal, only track signaled alarms and warnings. FPIN stats should not be incremented upon receipt of a signal. Link: https://lore.kernel.org/r/20220506035519.50908-11-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:36 +02:00
James Smart	40cf4ea4d2	scsi: lpfc: Fix resource leak in lpfc_sli4_send_seq_to_ulp() [ Upstream commit 646db1a560f44236b7278b822ca99a1d3b6ea72c ] If no handler is found in lpfc_complete_unsol_iocb() to match the rctl of a received frame, the frame is dropped and resources are leaked. Fix by returning resources when discarding an unhandled frame type. Update lpfc_fc_frame_check() handling of NOP basic link service. Link: https://lore.kernel.org/r/20220426181419.9154-1-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:34 +02:00
James Smart	ae373d66c4	scsi: lpfc: Fix call trace observed during I/O with CMF enabled [ Upstream commit d6d45f67a11136cb88a70a29ab22ea6db8ae6bd5 ] The following was seen with CMF enabled: BUG: using smp_processor_id() in preemptible code: systemd-udevd/31711 kernel: caller is lpfc_update_cmf_cmd+0x214/0x420 [lpfc] kernel: CPU: 12 PID: 31711 Comm: systemd-udevd kernel: Call Trace: kernel: <TASK> kernel: dump_stack_lvl+0x44/0x57 kernel: check_preemption_disabled+0xbf/0xe0 kernel: lpfc_update_cmf_cmd+0x214/0x420 [lpfc] kernel: lpfc_nvme_fcp_io_submit+0x23b4/0x4df0 [lpfc] this_cpu_ptr() calls smp_processor_id() in a preemptible context. Fix by using per_cpu_ptr() with raw_smp_processor_id() instead. Link: https://lore.kernel.org/r/20220412222008.126521-16-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:32 +02:00
James Smart	7625e81de2	scsi: lpfc: Fix SCSI I/O completion and abort handler deadlock [ Upstream commit 03cbbd7c2f5ee288f648f4aeedc765a181188553 ] During stress I/O tests with 500+ vports, hard LOCKUP call traces are observed. CPU A: native_queued_spin_lock_slowpath+0x192 _raw_spin_lock_irqsave+0x32 lpfc_handle_fcp_err+0x4c6 lpfc_fcp_io_cmd_wqe_cmpl+0x964 lpfc_sli4_fp_handle_cqe+0x266 __lpfc_sli4_process_cq+0x105 __lpfc_sli4_hba_process_cq+0x3c lpfc_cq_poll_hdler+0x16 irq_poll_softirq+0x76 __softirqentry_text_start+0xe4 irq_exit+0xf7 do_IRQ+0x7f CPU B: native_queued_spin_lock_slowpath+0x5b _raw_spin_lock+0x1c lpfc_abort_handler+0x13e scmd_eh_abort_handler+0x85 process_one_work+0x1a7 worker_thread+0x30 kthread+0x112 ret_from_fork+0x1f Diagram of lockup: CPUA CPUB ---- ---- lpfc_cmd->buf_lock phba->hbalock lpfc_cmd->buf_lock phba->hbalock Fix by reordering the taking of the lpfc_cmd->buf_lock and phba->hbalock in lpfc_abort_handler routine so that it tries to take the lpfc_cmd->buf_lock first before phba->hbalock. Link: https://lore.kernel.org/r/20220412222008.126521-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:32 +02:00
James Smart	271725e402	scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg() [ Upstream commit e294647b1aed4247fe52851f3a3b2b19ae906228 ] In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard lockup call trace hangs the system. Call Trace: _raw_spin_lock_irqsave+0x32/0x40 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc] lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc] lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc] lpfc_els_flush_cmd+0x43c/0x670 [lpfc] lpfc_els_flush_all_cmd+0x37/0x60 [lpfc] lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc] lpfc_do_work+0x1485/0x1d70 [lpfc] kthread+0x112/0x130 ret_from_fork+0x1f/0x40 Kernel panic - not syncing: Hard LOCKUP The same CPU tries to claim the phba->port_list_lock twice. Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and lpfc_printf_log() macros before calling lpfc_dmp_dbg(). There is no need to take the phba->port_list_lock within lpfc_dmp_dbg(). Link: https://lore.kernel.org/r/20220412222008.126521-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-06-09 10:22:32 +02:00
James Smart	026083cb43	scsi: lpfc: Fix queue failures when recovering from PCI parity error [ Upstream commit df0101197c4d9596682901631f3ee193ed354873 ] When recovering from a pci-parity error the driver is failing to re-create queues, causing recovery to fail. Looking deeper, it was found that the interrupt vector count allocated on the recovery was fewer than the vectors originally allocated. This disparity resulted in CPU map entries with stale information. When the driver tries to re-create the queues, it attempts to use the stale information which indicates an eq/interrupt vector that was no longer created. Fix by clearng the cpup map array before enabling and requesting the IRQs in the lpfc_sli_reset_slot_s4 routine(). Link: https://lore.kernel.org/r/20220317032737.45308-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-04-20 09:34:15 +02:00
James Smart	e6da726eb6	scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loop commit 7f4c5a26f735dea4bbc0eb8eb9da99cda95a8563 upstream. When connected point to point, the driver does not know the FC4's supported by the other end. In Fabrics, it can query the nameserver. Thus the driver must send PRLIs for the FC4s it supports and enable support based on the acc(ept) or rej(ect) of the respective FC4 PRLI. Currently the driver supports SCSI and NVMe PRLIs. Unfortunately, although the behavior is per standard, many devices have come to expect only SCSI PRLIs. In this particular example, the NVMe PRLI is properly RJT'd but the target decided that it must LOGO after seeing the unexpected NVMe PRLI. The LOGO causes the sequence to restart and login is now in an infinite failure loop. Fix the problem by having the driver, on a pt2pt link, remember NVMe PRLI accept or reject status across logout as long as the link stays "up". When retrying login, if the prior NVMe PRLI was rejected, it will not be sent on the next login. Link: https://lore.kernel.org/r/20220212163120.15385-1-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:03:20 +01:00
James Smart	06bd0f157e	scsi: lpfc: Fix mailbox command failure during driver initialization commit efe1dc571a5b808baa26682eef16561be2e356fd upstream. Contention for the mailbox interface may occur during driver initialization (immediately after a function reset), between mailbox commands initiated via ioctl (bsg) and those driver requested by the driver. After setting SLI_ACTIVE flag for a port, there is a window in which the driver will allow an ioctl to be initiated while the adapter is initializing and issuing mailbox commands via polling. The polling logic then gets confused. Correct by having thread setting SLI_ACTIVE spot an active mailbox command and allow it complete before proceeding. Link: https://lore.kernel.org/r/20210921143008.64212-1-jsmart2021@gmail.com Co-developed-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-23 12:03:02 +01:00
James Smart	cd4494f868	scsi: lpfc: Reduce log messages seen after firmware download commit 5852ed2a6a39c862c8a3fdf646e1f4e01b91d710 upstream. Messages around firmware download were incorrectly tagged as being related to discovery trace events. Thus, firmware download status ended up dumping the trace log as well as the firmware update message. As there were a couple of log messages in this state, the trace log was dumped multiple times. Resolve this by converting from trace events to SLI events. Link: https://lore.kernel.org/r/20220207180442.72836-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-16 12:56:40 +01:00
James Smart	6737f9a95a	scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled commit c80b27cfd93ba9f5161383f798414609e84729f3 upstream. The driver is initiating NVMe PRLIs to determine device NVMe support. This should not be occurring if CONFIG_NVME_FC support is disabled. Correct this by changing the default value for FC4 support. Currently it defaults to FCP and NVMe. With change, when NVME_FC support is not enabled in the kernel, the default value is just FCP. Link: https://lore.kernel.org/r/20220207180516.73052-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-02-16 12:56:40 +01:00
James Smart	99d41076c2	scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalance commit 7576d48c64f36f6fea9df2882f710a474fa35f40 upstream. Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call trace. A prior patch reworked the get/put handling by ensuring nlp_get was done before WQE submission and a put was done in the completion path. Unfortunately, the issue_els_rscn path had a piece of legacy code that did a nlp_put, causing an imbalance on the ref counts. Fixed by removing the unnecessary legacy code snippet. Link: https://lore.kernel.org/r/20211204002644.116455-4-jsmart2021@gmail.com Fixes: `4430f7fd09` ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-27 11:05:12 +01:00
James Smart	b693e737e0	scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup [ Upstream commit 7dd2e2a923173d637c272e483966be8e96a72b64 ] Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:05:00 +01:00
James Smart	5b91e80b63	scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV [ Upstream commit f0d3919697492950f57a26a1093aee53880d669d ] During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries were still busy. This situation was only seen doing rmmod after at least 1 vport (NPIV) instance was created and destroyed. The number of messages scaled with the number of vports created. When a vport is created, it can receive a PLOGI from another initiator Nport. When this happens, the driver prepares to ack the PLOGI and prepares an RPI for registration (via mbx cmd) which includes an mbuf allocation. During the unsolicited PLOGI processing and after the RPI preparation, the driver recognizes it is one of the vport instances and decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI, the mailbox struct allocated for RPI registration is freed, but the mbuf that was also allocated is not released. Fix by freeing the mbuf with the mailbox struct in the LS_RJT path. As part of the code review to figure the issue out a couple of other areas where found that also would not have released the mbuf. Those are cleaned up as well. Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:05:00 +01:00
Dan Carpenter	a68d72d899	scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write() [ Upstream commit 9020be114a47bf7ff33e179b3bb0016b91a098e6 ] The "mybuf" string comes from the user, so we need to ensure that it is NUL terminated. Link: https://lore.kernel.org/r/20211214070527.GA27934@kili Fixes: `bd2cdd5e40` ("scsi: lpfc: NVME Initiator: Add debugfs support") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-05 12:42:34 +01:00
James Smart	a8392866c5	scsi: lpfc: Fix non-recovery of remote ports following an unsolicited LOGO commit 0956ba63bd94355bf38cd40f7eb9104577739ab8 upstream. A commit introduced formal regstration of all Fabric nodes to the SCSI transport as well as REG/UNREG RPI mailbox requests. The commit introduced the NLP_RELEASE_RPI flag for rports set in the lpfc_cmpl_els_logo_acc() routine to help clean up the RPIs. This new code caused the driver to release the RPI value used for the remote port and marked the RPI invalid. When the driver later attempted to re-login, it would use the invalid RPI and the adapter rejected the PLOGI request. As no login occurred, the devloss timer on the rport expired and connectivity was lost. This patch corrects the code by removing the snippet that requests the rpi to be unregistered. This change only occurs on a node that is already marked to be rediscovered. This puts the code back to its original behavior, preserving the already-assigned rpi value (registered or not) which can be used on the re-login attempts. Link: https://lore.kernel.org/r/20211123165646.62740-1-jsmart2021@gmail.com Fixes: `fe83e3b9b4` ("scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller") Cc: <stable@vger.kernel.org> # v5.14+ Co-developed-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-12-08 09:04:42 +01:00
James Smart	bf76f56a7f	scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss [ Upstream commit af984c87293b19dccbd0b16afc57c5c9a4a279c7 ] A link bounce to a slow fabric may observe FDISC response delays lasting longer than devloss tmo. Current logic decrements the final fabric node kref during a devloss tmo event. This results in a NULL ptr dereference crash if the FDISC completes for that fabric node after devloss tmo. Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when devloss tmo triggers and we've noticed that fabric node recovery has already started or finished in between the time lpfc_dev_loss_tmo_callbk queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then the driver reverses the devloss tmo marked kref put with a kref get. If fabric node recovery fails, then the final kref put relies on the ELS timing out or the REG_LOGIN cmpl routine. Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:29 +01:00
James Smart	28de48a7ce	scsi: lpfc: Fix link down processing to address NULL pointer dereference [ Upstream commit 1854f53ccd88ad4e7568ddfafafffe71f1ceb0a6 ] If an FC link down transition while PLOGIs are outstanding to fabric well known addresses, outstanding ABTS requests may result in a NULL pointer dereference. Driver unload requests may hang with repeated "2878" log messages. The Link down processing results in ABTS requests for outstanding ELS requests. The Abort WQEs are sent for the ELSs before the driver had set the link state to down. Thus the driver is sending the Abort with the expectation that an ABTS will be sent on the wire. The Abort request is stalled waiting for the link to come up. In some conditions the driver may auto-complete the ELSs thus if the link does come up, the Abort completions may reference an invalid structure. Fix by ensuring that Abort set the flag to avoid link traffic if issued due to conditions where the link failed. Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:29 +01:00
James Smart	dbebf865b3	scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine [ Upstream commit 79b20beccea3a3938a8500acef4e6b9d7c66142f ] An error is detected with the following report when unloading the driver: "KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b" The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the flag is not cleared upon completion of the login. This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used as an rpi_ids array index. Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in lpfc_mbx_cmpl_fc_reg_login(). Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:29 +01:00
James Smart	814d3610c4	scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq() [ Upstream commit 99154581b05c8fb22607afb7c3d66c1bace6aa5d ] When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass the requests to the adapter. If such an attempt fails, a local "fail_msg" string is set and a log message output. The job is then added to a completions list for cancellation. Processing of any further jobs from the txq list continues, but since "fail_msg" remains set, jobs are added to the completions list regardless of whether a wqe was passed to the adapter. If successfully added to txcmplq, jobs are added to both lists resulting in list corruption. Fix by clearing the fail_msg string after adding a job to the completions list. This stops the subsequent jobs from being added to the completions list unless they had an appropriate failure. Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-25 09:48:24 +01:00
James Smart	be9866f92e	scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset [ Upstream commit d305c253af693e69a36cedec880aca6d0c6d789d ] A prior patch introduced HBA_NEEDS_CFG_PORT flag logic, but in lpfc_sli_brdrestart_s3() code path, right after HBA_NEEDS_CFG_PORT is set, the phba->hba_flag is cleared in lpfc_sli_brdreset(). Fix by calling lpfc_sli_chipset_init() to wait for successful restart of the HBA in lpfc_host_reset_handler() after lpfc_sli_brdrestart(). lpfc_sli_chipset_init() sets the HBA_NEEDS_CFG_PORT flag so that the lpfc_sli_hba_setup() routine from lpfc_online() will execute lpfc_sli_config_port() initialization step when the brdrestart is successful. Link: https://lore.kernel.org/r/20211020211417.88754-3-jsmart2021@gmail.com Fixes: `d2f2547efd` ("scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:16:54 +01:00
James Smart	ca4e260667	scsi: lpfc: Fix NVMe I/O failover to non-optimized path [ Upstream commit b507357f79171fb4fb4e732ca43a1f30bc5aab1d ] Currently, we hold off unregistering with NVMe transport layer until GID_FT or ADISC completes upon receipt of RSCN. In the ADISC discovery routine, for nodes not found in the GID_FT response, the nodes are unregistered from the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue to be outstanding and were not failed back to the OS. If an NVMe device, this mean there wasn't initial termination of the I/Os so they could be issued on a different NVMe path. Fix by unregistering the RPI so that I/O is cancelled. Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com Fixes: `0614568361` ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes") Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:16:46 +01:00
James Smart	0bb97f2486	scsi: lpfc: Fix FCP I/O flush functionality for TMF routines commit cd8a36a90babf958082b87bc6b4df5dd70901eba upstream. A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting of outstanding aborted I/Os and ABORT IOCBs. Thus, lpfc_reset_flush_io_context() called from any TMF routine does not properly wait to flush all outstanding FCP IOCBs leading to a block layer crash on an invalid scsi_cmnd->request pointer. kernel BUG at ../block/blk-core.c:1489! RIP: 0010:blk_requeue_request+0xaf/0xc0 ... Call Trace: <IRQ> __scsi_queue_insert+0x90/0xe0 [scsi_mod] blk_done_softirq+0x7e/0x90 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 do_IRQ+0x4c/0xd0 common_interrupt+0x87/0x87 </IRQ> Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ, LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN \|\| CMD_CLOSE_XRI_CN checks into a new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to build an ABORT iocb. Restore lpfc_reset_flush_io_context() functionality by including counting of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb(). Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com Fixes: `e136471135` ("scsi: lpfc: Fix illegal memory access on Abort IOCBs") Cc: <stable@vger.kernel.org> # v5.12+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:15:51 +01:00
James Smart	bea230d0b2	scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding commit 982fc3965d1350d3332e04046b0e101006184ba9 upstream. In a rarely executed path, FLOGI failure, there is a refcounting error. If FLOGI completed with an error, typically a timeout, the initial completion handler would remove the job reference. However, the job completion isn't the actual end of the job/exchange as the timeout usually initiates an ABTS, and upon that ABTS completion, a final completion is sent. The driver removes the reference again in the final completion. Thus the imbalance. In the buggy cases, if there was a link bounce while the delayed response is outstanding, the fport node may be referenced again but there was no additional reference as it is already present. The delayed completion then occurs and removes the last reference freeing the node and causing issues in the link up processed that is using the node. Fix this scenario by removing the snippet that removed the reference in the initial FLOGI completion. The bad snippet was poorly trying to identify the FLOGI as OK to do so by realizing the node was not registered with either SCSI or NVMe transport. Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com Fixes: `618e2ee146` ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node") Cc: <stable@vger.kernel.org> # v5.13+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:15:51 +01:00
James Smart	69a3a7bc72	scsi: lpfc: Fix memory overwrite during FC-GS I/O abort handling When an FC-GS I/O is aborted by lpfc, the driver requires a node pointer for a dereference operation. In the abort I/O routine, the driver miscasts a context pointer to the wrong data type and overwrites a single byte outside of the allocated space. This miscast is done in the abort I/O function handler because the handler works on both FC-GS and FC-LS commands. However, the code neglected to get the correct job location for the node. Fix this by acquiring the necessary node pointer from the correct job structure depending on the I/O type. Link: https://lore.kernel.org/r/20211004231210.35524-1-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-10-04 23:37:08 -04:00

1 2 3 4 5 ...

2068 Commits