commit 4bd1d80efb5af640f99157f39b50fb11326ce641 upstream.
Current implementation of update_mmu_cache function performs local TLB
flush. It does not take into account ASID information. Besides, it does
not take into account other harts currently running the same mm context
or possible migration of the running context to other harts. Meanwhile
TLB flush is not performed for every context switch if ASID support
is enabled.
Patch [1] proposed to add ASID support to update_mmu_cache to avoid
flushing local TLB entirely. This patch takes into account other
harts currently running the same mm context as well as possible
migration of this context to other harts.
For this purpose the approach from flush_icache_mm is reused. Remote
harts currently running the same mm context are informed via SBI calls
that they need to flush their local TLBs. All the other harts are marked
as needing a deferred TLB flush when this mm context runs on them.
[1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Fixes: 65d4b9c530 ("RISC-V: Implement ASID allocator")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.com/#t
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d8bf77a1dc3079692f54be3087a5fd16d90027b0 ]
With the PG_arch_1 we keep track if the page's data cache is clean,
architecture rely on this property to treat new pages as dirty with
respect to the data cache and perform the flushing before mapping the pages
into userspace.
This patch adds a new architecture hook, arch_clear_hugepage_flags,so that
architectures which rely on the page flags being in a particular state for
fresh allocations can adjust the flags accordingly when a page is freed
into the pool.
Fixes: 9e953cda5c ("riscv: Introduce huge page support for 32/64bit kernel")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Link: https://lore.kernel.org/r/20221024094725.3054311-3-tongtiangen@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7e1864332fbc1b993659eab7974da9fe8bf8c128 ]
Currently, when detecting vmap stack overflow, riscv firstly switches
to the so called shadow stack, then use this shadow stack to call the
get_overflow_stack() to get the overflow stack. However, there's
a race here if two or more harts use the same shadow stack at the same
time.
To solve this race, we introduce spin_shadow_stack atomic var, which
will be swap between its own address and 0 in atomic way, when the
var is set, it means the shadow_stack is being used; when the var
is cleared, it means the shadow_stack isn't being used.
Fixes: 31da94c25a ("riscv: add VMAP_STACK overflow detection")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
[Palmer: Add AQ to the swap, and also some comments.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 3f105a742725a1b78766a55169f1d827732e62b8 ]
The EFI page table is initially created as a copy of the kernel page table.
With VMAP_STACK enabled, kernel stacks are allocated in the vmalloc area:
if the stack is allocated in a new PGD (one that was not present at the
moment of the efi page table creation or not synced in a previous vmalloc
fault), the kernel will take a trap when switching to the efi page table
when the vmalloc kernel stack is accessed, resulting in a kernel panic.
Fix that by updating the efi kernel mappings before switching to the efi
page table.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Fixes: b91540d52a ("RISC-V: Add EFI runtime services")
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20221121133303.1782246-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9cc205e3c17d5716da7ebb7fa0c985555e95d009 upstream.
Fix port I/O string accessors such as `insb', `outsb', etc. which use
the physical PCI port I/O address rather than the corresponding memory
mapping to get at the requested location, which in turn breaks at least
accesses made by our parport driver to a PCIe parallel port such as:
PCI parallel port detected: 1415:c118, I/O at 0x1000(0x1008), IRQ 20
parport0: PC-style at 0x1000 (0x1008), irq 20, using FIFO [PCSPP,TRISTATE,COMPAT,EPP,ECP]
causing a memory access fault:
Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000001008
Oops [#1]
Modules linked in:
CPU: 1 PID: 350 Comm: cat Not tainted 6.0.0-rc2-00283-g10d4879f9ef0-dirty #23
Hardware name: SiFive HiFive Unmatched A00 (DT)
epc : parport_pc_fifo_write_block_pio+0x266/0x416
ra : parport_pc_fifo_write_block_pio+0xb4/0x416
epc : ffffffff80542c3e ra : ffffffff80542a8c sp : ffffffd88899fc60
gp : ffffffff80fa2700 tp : ffffffd882b1e900 t0 : ffffffd883d0b000
t1 : ffffffffff000002 t2 : 4646393043330a38 s0 : ffffffd88899fcf0
s1 : 0000000000001000 a0 : 0000000000000010 a1 : 0000000000000000
a2 : ffffffd883d0a010 a3 : 0000000000000023 a4 : 00000000ffff8fbb
a5 : ffffffd883d0a001 a6 : 0000000100000000 a7 : ffffffc800000000
s2 : ffffffffff000002 s3 : ffffffff80d28880 s4 : ffffffff80fa1f50
s5 : 0000000000001008 s6 : 0000000000000008 s7 : ffffffd883d0a000
s8 : 0004000000000000 s9 : ffffffff80dc1d80 s10: ffffffd8807e4000
s11: 0000000000000000 t3 : 00000000000000ff t4 : 393044410a303930
t5 : 0000000000001000 t6 : 0000000000040000
status: 0000000200000120 badaddr: 0000000000001008 cause: 000000000000000f
[<ffffffff80543212>] parport_pc_compat_write_block_pio+0xfe/0x200
[<ffffffff8053bbc0>] parport_write+0x46/0xf8
[<ffffffff8050530e>] lp_write+0x158/0x2d2
[<ffffffff80185716>] vfs_write+0x8e/0x2c2
[<ffffffff80185a74>] ksys_write+0x52/0xc2
[<ffffffff80185af2>] sys_write+0xe/0x16
[<ffffffff80003770>] ret_from_syscall+0x0/0x2
---[ end trace 0000000000000000 ]---
For simplicity address the problem by adding PCI_IOBASE to the physical
address requested in the respective wrapper macros only, observing that
the raw accessors such as `__insb', `__outsb', etc. are not supposed to
be used other than by said macros. Remove the cast to `long' that is no
longer needed on `addr' now that it is used as an offset from PCI_IOBASE
and add parentheses around `addr' needed for predictable evaluation in
macro expansion. No need to make said adjustments in separate changes
given that current code is gravely broken and does not ever work.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Fixes: fab957c11e ("RISC-V: Atomic and Locking Code")
Cc: stable@vger.kernel.org # v4.15+
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209220223080.29493@angie.orcam.me.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61114e734ccb804bc12561ab4020745e02c468c2 upstream.
After commit 49b290e430d3 ("riscv: prevent compressed instructions in
alternatives"), builds with LLVM's integrated assembler fail:
In file included from arch/riscv/mm/init.c:10:
In file included from ./include/linux/mm.h:29:
In file included from ./include/linux/pgtable.h:6:
In file included from ./arch/riscv/include/asm/pgtable.h:108:
./arch/riscv/include/asm/tlbflush.h:23:2: error: expected assembly-time absolute expression
ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory"));
^
./arch/riscv/include/asm/errata_list.h:33:5: note: expanded from macro 'ALT_FLUSH_TLB_PAGE'
asm(ALTERNATIVE("sfence.vma %0", "sfence.vma", SIFIVE_VENDOR_ID, \
^
./arch/riscv/include/asm/alternative-macros.h:187:2: note: expanded from macro 'ALTERNATIVE'
_ALTERNATIVE_CFG(old_content, new_content, vendor_id, errata_id, CONFIG_k)
^
./arch/riscv/include/asm/alternative-macros.h:113:2: note: expanded from macro '_ALTERNATIVE_CFG'
__ALTERNATIVE_CFG(old_c, new_c, vendor_id, errata_id, IS_ENABLED(CONFIG_k))
^
./arch/riscv/include/asm/alternative-macros.h:110:2: note: expanded from macro '__ALTERNATIVE_CFG'
ALT_NEW_CONTENT(vendor_id, errata_id, enable, new_c)
^
./arch/riscv/include/asm/alternative-macros.h:99:3: note: expanded from macro 'ALT_NEW_CONTENT'
".org . - (889b - 888b) + (887b - 886b)\n" \
^
<inline asm>:26:6: note: instantiated into assembly here
.org . - (889b - 888b) + (887b - 886b)
^
This error happens because LLVM's integrated assembler has a one-pass
design, which means it cannot figure out the instruction lengths when
the .org directive is outside of the subsection that contains the
instructions, which was changed by the .option directives added by the
above change.
Move the .org directives before the .previous directive so that these
directives are always within the same subsection, which resolves the
failures and does not introduce any new issues with GNU as. This was
done for arm64 in commit 966a0acce2 ("arm64/alternatives: move length
validation inside the subsection") and commit 22315a2296 ("arm64:
alternatives: Move length validation in alternative_{insn, endif}").
While there is no error from the assembly versions of the macro, they
appear to have the same problem so just make the same change there as
well so that there are no problems in the future.
Link: https://github.com/ClangBuiltLinux/linux/issues/1640
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220516214520.3252074-1-nathan@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6d01238623faa9425f820353d2066baf6c9dc872 upstream.
In the event that random_get_entropy() can't access a cycle counter or
similar, falling back to returning 0 is really not the best we can do.
Instead, at least calling random_get_entropy_fallback() would be
preferable, because that always needs to return _something_, even
falling back to jiffies eventually. It's not as though
random_get_entropy_fallback() is super high precision or guaranteed to
be entropic, but basically anything that's not zero all the time is
better than returning zero all the time.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 869c70609248102f3a2e95a39b6233ff6ea2c932 upstream.
Use what is currently the SMP=y version of riscv_cpuid_to_hartid_mask()
for both SMP=y and SMP=n to fix a build failure with KVM=m and SMP=n due
to boot_cpu_hartid not being exported. This also fixes a second bug
where the SMP=n version assumes the sole CPU in the system is in the
incoming mask, which may not hold true in kvm_riscv_vcpu_sbi_ecall() if
the KVM guest VM has multiple vCPUs (on a SMP=n system).
Fixes: 1ef46c231d ("RISC-V: Implement new SBI v0.2 extensions")
Reported-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4bc5e64e6cf37007e436970024e5998ee0935651 upstream.
Commit 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup
for all arches") made the Generic System Framebuffers (sysfb) driver able
to be built on non-x86 architectures.
But it left the efifb_setup_from_dmi() function prototype declaration in
the architecture specific headers. This could lead to the following
compiler warning as reported by the kernel test robot:
drivers/firmware/efi/sysfb_efi.c:70:6: warning: no previous prototype for function 'efifb_setup_from_dmi' [-Wmissing-prototypes]
void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
^
drivers/firmware/efi/sysfb_efi.c:70:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void efifb_setup_from_dmi(struct screen_info *si, const char *opt)
Fixes: 8633ef82f1 ("drivers/firmware: consolidate EFI framebuffer setup for all arches")
Reported-by: kernel test robot <lkp@intel.com>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20211126001333.555514-1-javierm@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
As commit 601255ae3c ("arm64: vdso: move data page before code pages"), the
same issue exists on riscv, testcase is shown below, make sure that vdso.so is
bigger than page size,
struct timespec tp;
clock_gettime(5, &tp);
printf("tv_sec: %ld, tv_nsec: %ld\n", tp.tv_sec, tp.tv_nsec);
without this patch, test result : tv_sec: 0, tv_nsec: 0
with this patch, test result : tv_sec: 1629271537, tv_nsec: 748000000
Move the vdso data page in front of the VDSO area to fix the issue.
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The asm/vdso.h will be included in vdso.lds.S in the next patch, the
following cleanup is needed to avoid syntax error:
1.the declaration of sys_riscv_flush_icache() is moved into asm/syscall.h.
2.the definition of struct vdso_data is moved into kernel/vdso.c.
2.the definition of VDSO_SYMBOL is placed under "#ifndef __ASSEMBLY__".
Also remove the redundant linux/types.h include.
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This enlarges the bits availiable for stack randomisation on RV64 from
the default of 8MiB to 1GiB, to match arm64 and x86.
Also, update the documentation to reflect our support for stack
randomisation.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
[Palmer: commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Pull RISC-V updates from Palmer Dabbelt:
- support PC-relative instructions (auipc and branches) in kprobes
- support for forced IRQ threading
- support for the hlt/nohlt kernel command line options, via the
generic idle loop
- show the edge/level triggered behavior of interrupts
in /proc/interrupts
- a handful of cleanups to our address mapping mechanisms
- support for allocating gigantic hugepages via CMA
- support for the undefined behavior sanitizer (UBSAN)
- a handful of cleanups to the VDSO that allow the kernel to build with
LLD.
- support for hugepage migration
* tag 'riscv-for-linus-5.15-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
riscv: add support for hugepage migration
RISC-V: Fix VDSO build for !MMU
riscv: use strscpy to replace strlcpy
riscv: explicitly use symbol offsets for VDSO
riscv: Enable Undefined Behavior Sanitizer UBSAN
riscv: Keep the riscv Kconfig selects sorted
riscv: Support allocating gigantic hugepages using CMA
riscv: fix the global name pfn_base confliction error
riscv: Move early fdt mapping creation in its own function
riscv: Simplify BUILTIN_DTB device tree mapping handling
riscv: Use __maybe_unused instead of #ifdefs around variable declarations
riscv: Get rid of map_size parameter to create_kernel_page_table
riscv: Introduce va_kernel_pa_offset for 32-bit kernel
riscv: Optimize kernel virtual address conversion macro
dt-bindings: riscv: add starfive jh7100 bindings
riscv: Enable GENERIC_IRQ_SHOW_LEVEL
riscv: Enable idle generic idle loop
riscv: Allow forced irq threading
riscv: Implement thread_struct whitelist for hardened usercopy
riscv: kprobes: implement the branch instructions
...
Pull drm updates from Dave Airlie:
"Highlights:
- i915 has seen a lot of refactoring and uAPI cleanups due to a
change in the upstream direction going forward
This has all been audited with known userspace, but there may be
some pitfalls that were missed.
- i915 now uses common TTM to enable discrete memory on DG1/2 GPUs
- i915 enables Jasper and Elkhart Lake by default and has preliminary
XeHP/DG2 support
- amdgpu adds support for Cyan Skillfish
- lots of implicit fencing rules documented and fixed up in drivers
- msm now uses the core scheduler
- the irq midlayer has been removed for non-legacy drivers
- the sysfb code now works on more than x86.
Otherwise the usual smattering of stuff everywhere, panels, bridges,
refactorings.
Detailed summary:
core:
- extract i915 eDP backlight into core
- DP aux bus support
- drm_device.irq_enabled removed
- port drivers to native irq interfaces
- export gem shadow plane handling for vgem
- print proper driver name in framebuffer registration
- driver fixes for implicit fencing rules
- ARM fixed rate compression modifier added
- updated fb damage handling
- rmfb ioctl logging/docs
- drop drm_gem_object_put_locked
- define DRM_FORMAT_MAX_PLANES
- add gem fb vmap/vunmap helpers
- add lockdep_assert(once) helpers
- mark drm irq midlayer as legacy
- use offset adjusted bo mapping conversion
vgaarb:
- cleanups
fbdev:
- extend efifb handling to all arches
- div by 0 fixes for multiple drivers
udmabuf:
- add hugepage mapping support
dma-buf:
- non-dynamic exporter fixups
- document implicit fencing rules
amdgpu:
- Initial Cyan Skillfish support
- switch virtual DCE over to vkms based atomic
- VCN/JPEG power down fixes
- NAVI PCIE link handling fixes
- AMD HDMI freesync fixes
- Yellow Carp + Beige Goby fixes
- Clockgating/S0ix/SMU/EEPROM fixes
- embed hw fence in job
- rework dma-resv handling
- ensure eviction to system ram
amdkfd:
- uapi: SVM address range query added
- sysfs leak fix
- GPUVM TLB optimizations
- vmfault/migration counters
i915:
- Enable JSL and EHL by default
- preliminary XeHP/DG2 support
- remove all CNL support (never shipped)
- move to TTM for discrete memory support
- allow mixed object mmap handling
- GEM uAPI spring cleaning
- add I915_MMAP_OBJECT_FIXED
- reinstate ADL-P mmap ioctls
- drop a bunch of unused by userspace features
- disable and remove GPU relocations
- revert some i915 misfeatures
- major refactoring of GuC for Gen11+
- execbuffer object locking separate step
- reject caching/set-domain on discrete
- Enable pipe DMC loading on XE-LPD and ADL-P
- add PSF GV point support
- Refactor and fix DDI buffer translations
- Clean up FBC CFB allocation code
- Finish INTEL_GEN() and friends macro conversions
nouveau:
- add eDP backlight support
- implicit fence fix
msm:
- a680/7c3 support
- drm/scheduler conversion
panfrost:
- rework GPU reset
virtio:
- fix fencing for planes
ast:
- add detect support
bochs:
- move to tiny GPU driver
vc4:
- use hotplug irqs
- HDMI codec support
vmwgfx:
- use internal vmware device headers
ingenic:
- demidlayering irq
rcar-du:
- shutdown fixes
- convert to bridge connector helpers
zynqmp-dsub:
- misc fixes
mgag200:
- convert PLL handling to atomic
mediatek:
- MT8133 AAL support
- gem mmap object support
- MT8167 support
etnaviv:
- NXP Layerscape LS1028A SoC support
- GEM mmap cleanups
tegra:
- new user API
exynos:
- missing unlock fix
- build warning fix
- use refcount_t"
* tag 'drm-next-2021-08-31-1' of git://anongit.freedesktop.org/drm/drm: (1318 commits)
drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank to bounding box
drm/amd/display: Remove duplicate dml init
drm/amd/display: Update bounding box states (v2)
drm/amd/display: Update number of DCN3 clock states
drm/amdgpu: disable GFX CGCG in aldebaran
drm/amdgpu: Clear RAS interrupt status on aldebaran
drm/amdgpu: Add support for RAS XGMI err query
drm/amdkfd: Account for SH/SE count when setting up cu masks.
drm/amdgpu: rename amdgpu_bo_get_preferred_pin_domain
drm/amdgpu: drop redundant cancel_delayed_work_sync call
drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend
drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend
drm/amdkfd: map SVM range with correct access permission
drm/amdkfd: check access permisson to restore retry fault
drm/amdgpu: Update RAS XGMI Error Query
drm/amdgpu: Add driver infrastructure for MCA RAS
drm/amd/display: Add Logging for HDMI color depth information
drm/amd/amdgpu: consolidate PSP TA init shared buf functions
drm/amd/amdgpu: add name field back to ras_common_if
drm/amdgpu: Fix build with missing pm_suspend_target_state module export
...
We don't have a VDSO for the !MMU configurations, so don't try to build
one.
Fixes: fde9c59aeb ("riscv: explicitly use symbol offsets for VDSO")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The current implementation of the `__rt_sigaction` reference computed an
absolute offset relative to the mapped base of the VDSO. While this can
be handled in the medlow model, the medany model cannot handle this as
it is meant to be position independent. The current implementation
relied on the BFD linker relaxing the PC-relative relocation into an
absolute relocation as it was a near-zero address allowing it to be
referenced relative to `zero`.
We now extract the offsets and create a generated header allowing the
build with LLVM and lld to succeed as we no longer depend on the linker
rewriting address references near zero. This change was largely
modelled after the ARM64 target which does something similar.
Signed-off-by: Saleem Abdulrasool <abdulras@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
RISCV uses a global variable pfn_base for page/pfn translation. But this
is a common name and will be used elsewhere. In those cases, the
page-pfn macros which refer to this name will be referred to the
local/input variable instead. (such as in vfio_pin_pages_remote). This
make everything wrong.
This patch changes the name from pfn_base to riscv_pfn_base to fix
this problem.
Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
va_kernel_pa_offset was only used for 64-bit as the kernel mapping lies
in the linear mapping for 32-bit kernel and then only the offset between
the PAGE_OFFSET and the kernel load address is needed.
But this distinction complexifies the code with #ifdefs and especially
with a separate definition of the address conversions macros.
Simplify the code by defining this variable for both 32-bit and 64-bit.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The current test in kernel_mapping_va_to_pa only applies when
CONFIG_XIP_KERNEL is set, so use IS_ENABLED to optimize this macro at
compile-time in standard kernels that do not require this test.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-By: Vitaly Wool <vitaly.wool@konsulko.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The usage of CONFIG_PHYS_RAM_BASE for all kernel types was a mistake:
this value is implementation-specific and this breaks the genericity of
the RISC-V kernel.
Fix this by introducing a new variable phys_ram_base that holds this
value at runtime and use it in the kernel physical address conversion
macro. Since this value is used only for XIP kernels, evaluate it only if
CONFIG_XIP_KERNEL is set which in addition optimizes this macro for
standard kernels at compile-time.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 44c9225729 ("RISC-V: enable XIP")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct.
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
drm-misc-next for v5.15-rc1:
UAPI Changes:
- Remove sysfs stats for dma-buf attachments, as it causes a performance regression.
Previous merge is not in a rc kernel yet, so no userspace regression possible.
Cross-subsystem Changes:
- Sanitize user input in kyro's viewport ioctl.
- Use refcount_t in fb_info->count
- Assorted fixes to dma-buf.
- Extend x86 efifb handling to all archs.
- Fix neofb divide by 0.
- Document corpro,gm7123 bridge dt bindings.
Core Changes:
- Slightly rework drm master handling.
- Cleanup vgaarb handling.
- Assorted fixes.
Driver Changes:
- Add support for ws2401 panel.
- Assorted fixes to stm, ast, bochs.
- Demidlayer ingenic irq.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2d0d2fe8-01fc-e216-c3fd-38db9e69944e@linux.intel.com
The register_gop_device() function registers an "efi-framebuffer" platform
device to match against the efifb driver, to have an early framebuffer for
EFI platforms.
But there is already support to do exactly the same by the Generic System
Framebuffers (sysfb) driver. This used to be only for X86 but it has been
moved to drivers/firmware and could be reused by other architectures.
Also, besides supporting registering an "efi-framebuffer", this driver can
register a "simple-framebuffer" allowing to use the siple{fb,drm} drivers
on non-X86 EFI platforms. For example, on aarch64 these drivers can only
be used with DT and doesn't have code to register a "simple-frambuffer"
platform device when booting with EFI.
For these reasons, let's remove the register_gop_device() duplicated code
and instead move the platform specific logic that's there to sysfb driver.
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210625131359.1804394-1-javierm@redhat.com
Pull RISC-V updates from Palmer Dabbelt:
"We have a handful of new features for 5.14:
- Support for transparent huge pages.
- Support for generic PCI resources mapping.
- Support for the mem= kernel parameter.
- Support for KFENCE.
- A handful of fixes to avoid W+X mappings in the kernel.
- Support for VMAP_STACK based overflow detection.
- An optimized copy_{to,from}_user"
* tag 'riscv-for-linus-5.14-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (37 commits)
riscv: xip: Fix duplicate included asm/pgtable.h
riscv: Fix PTDUMP output now BPF region moved back to module region
riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall
riscv: add VMAP_STACK overflow detection
riscv: ptrace: add argn syntax
riscv: mm: fix build errors caused by mk_pmd()
riscv: Introduce structure that group all variables regarding kernel mapping
riscv: Map the kernel with correct permissions the first time
riscv: Introduce set_kernel_memory helper
riscv: Enable KFENCE for riscv64
RISC-V: Use asm-generic for {in,out}{bwlq}
riscv: add ASID-based tlbflushing methods
riscv: pass the mm_struct to __sbi_tlb_flush_range
riscv: Add mem kernel parameter support
riscv: Simplify xip and !xip kernel address conversion macros
riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED
riscv: Only initialize swiotlb when necessary
riscv: fix typo in init.c
riscv: Cleanup unused functions
riscv: mm: Use better bitmap_zalloc()
...
This enables ftrace kprobe events to access kernel function
arguments via $argN syntax.
Signed-off-by: Jeff Xie <huan.xie@suse.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
With "riscv: mm: add THP support on 64-bit", mk_pmd() function
introduce build errors,
1.build with CONFIG_ARCH_RV32I=y:
arch/riscv/include/asm/pgtable.h: In function 'mk_pmd':
arch/riscv/include/asm/pgtable.h:513:9: error: implicit declaration of function 'pfn_pmd';
did you mean 'pfn_pgd'? [-Werror=implicit-function-declaration]
2.build with CONFIG_SPARSEMEM=y && CONFIG_SPARSEMEM_VMEMMAP=n
arch/riscv/include/asm/pgtable.h: In function 'mk_pmd':
include/asm-generic/memory_model.h:64:14: error: implicit declaration of function 'page_to_section';
did you mean 'present_section'? [-Werror=implicit-function-declaration]
Move the definition of mk_pmd to pgtable-64.h to fix the first error.
Use macro definition instead of inline function for mk_pmd
to fix the second problem. It is similar to the mk_pte macro.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
We have a lot of variables that are used to hold kernel mapping addresses,
offsets between physical and virtual mappings and some others used for XIP
kernels: they are all defined at different places in mm/init.c, so group
them into a single structure with, for some of them, more explicit and concise
names.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
This contains both the short-term fix for the W+X boot mappings and the
larger cleanup.
* riscv-wx-mappings:
riscv: Map the kernel with correct permissions the first time
riscv: Introduce set_kernel_memory helper
riscv: Simplify xip and !xip kernel address conversion macros
riscv: Remove CONFIG_PHYS_RAM_BASE_FIXED
riscv: mm: Fix W+X mappings at boot
For 64-bit kernels, we map all the kernel with write and execute
permissions and afterwards remove writability from text and executability
from data.
For 32-bit kernels, the kernel mapping resides in the linear mapping, so we
map all the linear mapping as writable and executable and afterwards we
remove those properties for unused memory and kernel mapping as
described above.
Change this behavior to directly map the kernel with correct permissions
and avoid going through the whole mapping to fix the permissions.
At the same time, this fixes an issue introduced by commit 2bfc6cd81b
("riscv: Move kernel mapping outside of linear mapping") as reported
here https://github.com/starfive-tech/linux/issues/17.
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This helper should be used for setting permissions to the kernel
mapping as it takes pointers as arguments and then avoids explicit cast
to unsigned long needed for the set_memory_* API.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Add architecture specific implementation details for KFENCE and enable
KFENCE for the riscv64 architecture. In particular, this implements the
required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the kfence pool to be mapped at
page granularity.
Testing this patch using the testcases in kfence_test.c and all passed.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Implement optimized version of the tlb flushing routines for systems
using ASIDs. These are behind the use_asid_allocator static branch to
not affect existing systems not using ASIDs.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
[hch: rebased on top of previous cleanups, use the same algorithm as
the non-ASID based code for local vs global flushes, keep functions
as local as possible]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Pull locking updates from Ingo Molnar:
- Core locking & atomics:
- Convert all architectures to ARCH_ATOMIC: move every architecture
to ARCH_ATOMIC, then get rid of ARCH_ATOMIC and all the
transitory facilities and #ifdefs.
Much reduction in complexity from that series:
63 files changed, 756 insertions(+), 4094 deletions(-)
- Self-test enhancements
- Futexes:
- Add the new FUTEX_LOCK_PI2 ABI, which is a variant that doesn't
set FLAGS_CLOCKRT (.e. uses CLOCK_MONOTONIC).
[ The temptation to repurpose FUTEX_LOCK_PI's implicit setting of
FLAGS_CLOCKRT & invert the flag's meaning to avoid having to
introduce a new variant was resisted successfully. ]
- Enhance futex self-tests
- Lockdep:
- Fix dependency path printouts
- Optimize trace saving
- Broaden & fix wait-context checks
- Misc cleanups and fixes.
* tag 'locking-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
locking/lockdep: Correct the description error for check_redundant()
futex: Provide FUTEX_LOCK_PI2 to support clock selection
futex: Prepare futex_lock_pi() for runtime clock selection
lockdep/selftest: Remove wait-type RCU_CALLBACK tests
lockdep/selftests: Fix selftests vs PROVE_RAW_LOCK_NESTING
lockdep: Fix wait-type for empty stack
locking/selftests: Add a selftest for check_irq_usage()
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
locking/lockdep: Remove the unnecessary trace saving
locking/lockdep: Fix the dep path printing for backwards BFS
selftests: futex: Add futex compare requeue test
selftests: futex: Add futex wait test
seqlock: Remove trailing semicolon in macros
locking/lockdep: Reduce LOCKDEP dependency list
locking/lockdep,doc: Improve readability of the block matrix
locking/atomics: atomic-instrumented: simplify ifdeffery
locking/atomic: delete !ARCH_ATOMIC remnants
locking/atomic: xtensa: move to ARCH_ATOMIC
locking/atomic: sparc: move to ARCH_ATOMIC
locking/atomic: sh: move to ARCH_ATOMIC
...
Andreas reported commit fc8504765e ("riscv: bpf: Avoid breaking W^X")
breaks booting with one kind of defconfig, I reproduced a kernel panic
with the defconfig:
[ 0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220
[ 0.139159] Oops [#1]
[ 0.139303] Modules linked in:
[ 0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
[ 0.139934] Hardware name: riscv-virtio,qemu (DT)
[ 0.140193] epc : __memset+0xc4/0xfc
[ 0.140416] ra : skb_flow_dissector_init+0x1e/0x82
[ 0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0
[ 0.140878] gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158
[ 0.141156] t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0
[ 0.141424] s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000
[ 0.141654] a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064
[ 0.141893] a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff
[ 0.142126] s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088
[ 0.142353] s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438
[ 0.142584] s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac
[ 0.142810] s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
[ 0.143042] t5 : 0000000000000155 t6 : 00000000000003ff
[ 0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f
[ 0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc
[ 0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
[ 0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168
[ 0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
[ 0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110
[ 0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc
[ 0.145124] ---[ end trace f1e9643daa46d591 ]---
After some investigation, I think I found the root cause: commit
2bfc6cd81b ("move kernel mapping outside of linear mapping") moves
BPF JIT region after the kernel:
| #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end)
The &_end is unlikely aligned with PMD size, so the front bpf jit
region sits with part of kernel .data section in one PMD size mapping.
But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
called to make the first bpf jit prog ROX, we will make part of kernel
.data section RO too, so when we write to, for example memset the
.data section, MMU will trigger a store page fault.
To fix the issue, we need to ensure the BPF JIT region is PMD size
aligned. This patch acchieve this goal by restoring the BPF JIT region
to original position, I.E the 128MB before kernel .text section. The
modification to kasan_init.c is inspired by Alexandre.
Fixes: fc8504765e ("riscv: bpf: Avoid breaking W^X")
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>