108 Commits

Author SHA1 Message Date
Srinivasarao Pathipati
16e939c6e4 Merge android11-5.4.161+ (b9d179c) into msm-5.4
* refs/heads/tmp-b9d179c:
  UPSTREAM: driver core: Fix possible memory leak in device_link_add()
  UPSTREAM: blk-mq: fix kernel panic during iterating over flush request
  UPSTREAM: net: xfrm: fix memory leak in xfrm_user_rcv_msg
  UPSTREAM: binder: fix the missing BR_FROZEN_REPLY in binder_return_strings
  ANDROID: incremental-fs: fix mount_fs issue
  UPSTREAM: vfs: fs_context: fix up param length parsing in legacy_parse_param
  ANDROID: GKI: disable CONFIG_FORTIFY_SOURCE
  Linux 5.4.161
  erofs: fix unsafe pagevec reuse of hooked pclusters
  erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()
  PCI: Add MSI masking quirk for Nvidia ION AHCI
  PCI/MSI: Deal with devices lying about their MSI mask capability
  PCI/MSI: Destroy sysfs before freeing entries
  parisc/entry: fix trace test in syscall exit path
  fortify: Explicitly disable Clang support
  scsi: ufs: Fix tm request when non-fatal error happens
  ext4: fix lazy initialization next schedule time computation in more granular unit
  MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL
  scsi: ufs: Fix interrupt error message for shared interrupts
  soc/tegra: pmc: Fix imbalanced clock disabling in error code path
  Revert "net: sched: update default qdisc visibility after Tx queue cnt changes"
  Revert "serial: core: Fix initializing and restoring termios speed"
  Linux 5.4.160
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  ath10k: fix invalid dma_addr_t token assignment
  SUNRPC: Partial revert of commit 6f9f17287e78
  PCI: Add PCI_EXP_DEVCTL_PAYLOAD_* macros
  powerpc/powernv/prd: Unregister OPAL_MSG_PRD2 notifier during module unload
  s390/cio: make ccw_device_dma_* more robust
  s390/tape: fix timer initialization in tape_std_assign()
  s390/cio: check the subchannel validity for dev_busid
  video: backlight: Drop maximum brightness override for brightness zero
  mm, oom: do not trigger out_of_memory from the #PF
  mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks
  powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
  powerpc/security: Add a helper to query stf_barrier type
  powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
  powerpc/bpf: Validate branch ranges
  powerpc/lib: Add helper to check if offset is within conditional branch range
  ovl: fix deadlock in splice write
  9p/net: fix missing error check in p9_check_errors
  net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE
  f2fs: should use GFP_NOFS for directory inodes
  irqchip/sifive-plic: Fixup EOI failed when masked
  parisc: Fix set_fixmap() on PA1.x CPUs
  parisc: Fix backtrace to always include init funtion names
  ARM: 9156/1: drop cc-option fallbacks for architecture selection
  ARM: 9155/1: fix early early_iounmap()
  selftests/net: udpgso_bench_rx: fix port argument
  cxgb4: fix eeprom len when diagnostics not implemented
  net/smc: fix sk_refcnt underflow on linkdown and fallback
  vsock: prevent unnecessary refcnt inc for nonblocking connect
  net: hns3: allow configure ETS bandwidth of all TCs
  net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any
  bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding
  arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
  nfc: pn533: Fix double free when pn533_fill_fragment_skbs() fails
  llc: fix out-of-bound array index in llc_sk_dev_hash()
  perf bpf: Add missing free to bpf_event__print_bpf_prog_info()
  zram: off by one in read_block_state()
  mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and zs_unregister_migration()
  bonding: Fix a use-after-free problem when bond_sysfs_slave_add() failed
  ACPI: PMIC: Fix intel_pmic_regs_handler() read accesses
  net: vlan: fix a UAF in vlan_dev_real_dev()
  net: davinci_emac: Fix interrupt pacing disable
  xen-pciback: Fix return in pm_ctrl_init()
  i2c: xlr: Fix a resource leak in the error handling path of 'xlr_i2c_probe()'
  NFSv4: Fix a regression in nfs_set_open_stateid_locked()
  scsi: qla2xxx: Turn off target reset during issue_lip
  scsi: qla2xxx: Fix gnl list corruption
  ar7: fix kernel builds for compiler test
  watchdog: f71808e_wdt: fix inaccurate report in WDIOC_GETTIMEOUT
  m68k: set a default value for MEMORY_RESERVE
  signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
  dmaengine: dmaengine_desc_callback_valid(): Check for `callback_result`
  netfilter: nfnetlink_queue: fix OOB when mac header was cleared
  soc: fsl: dpaa2-console: free buffer before returning from dpaa2_console_read
  auxdisplay: ht16k33: Fix frame buffer device blanking
  auxdisplay: ht16k33: Connect backlight to fbdev
  auxdisplay: img-ascii-lcd: Fix lock-up when displaying empty string
  dmaengine: at_xdmac: fix AT_XDMAC_CC_PERID() macro
  mtd: core: don't remove debugfs directory if device is in use
  mtd: spi-nor: hisi-sfc: Remove excessive clk_disable_unprepare()
  fs: orangefs: fix error return code of orangefs_revalidate_lookup()
  NFS: Fix deadlocks in nfs_scan_commit_list()
  opp: Fix return in _opp_add_static_v2()
  PCI: aardvark: Fix preserving PCI_EXP_RTCTL_CRSSVE flag on emulated bridge
  PCI: aardvark: Don't spam about PIO Response Status
  drm/plane-helper: fix uninitialized variable reference
  pnfs/flexfiles: Fix misplaced barrier in nfs4_ff_layout_prepare_ds
  rpmsg: Fix rpmsg_create_ept return when RPMSG config is not defined
  apparmor: fix error check
  power: supply: bq27xxx: Fix kernel crash on IRQ handler register error
  mips: cm: Convert to bitfield API to fix out-of-bounds access
  powerpc/44x/fsp2: add missing of_node_put
  HID: u2fzero: properly handle timeouts in usb_submit_urb
  HID: u2fzero: clarify error check and length calculations
  serial: xilinx_uartps: Fix race condition causing stuck TX
  phy: qcom-qusb2: Fix a memory leak on probe
  ASoC: cs42l42: Defer probe if request_threaded_irq() returns EPROBE_DEFER
  ASoC: cs42l42: Correct some register default values
  ARM: dts: stm32: fix SAI sub nodes register range
  staging: ks7010: select CRYPTO_HASH/CRYPTO_MICHAEL_MIC
  RDMA/mlx4: Return missed an error if device doesn't support steering
  scsi: csiostor: Uninitialized data in csio_ln_vnp_read_cbfn()
  power: supply: rt5033_battery: Change voltage values to µV
  usb: gadget: hid: fix error code in do_config()
  serial: 8250_dw: Drop wrong use of ACPI_PTR()
  video: fbdev: chipsfb: use memset_io() instead of memset()
  clk: at91: check pmc node status before registering syscore ops
  memory: fsl_ifc: fix leak of irq and nand_irq in fsl_ifc_ctrl_probe
  soc/tegra: Fix an error handling path in tegra_powergate_power_up()
  arm: dts: omap3-gta04a4: accelerometer irq fix
  ALSA: hda: Reduce udelay() at SKL+ position reporting
  JFS: fix memleak in jfs_mount
  MIPS: loongson64: make CPU_LOONGSON64 depends on MIPS_FP_SUPPORT
  scsi: dc395: Fix error case unwinding
  ARM: dts: at91: tse850: the emac<->phy interface is rmii
  arm64: dts: meson-g12a: Fix the pwm regulator supply properties
  RDMA/bnxt_re: Fix query SRQ failure
  ARM: dts: qcom: msm8974: Add xo_board reference clock to DSI0 PHY
  arm64: dts: rockchip: Fix GPU register width for RK3328
  ARM: s3c: irq-s3c24xx: Fix return value check for s3c24xx_init_intc()
  clk: mvebu: ap-cpu-clk: Fix a memory leak in error handling paths
  RDMA/rxe: Fix wrong port_cap_flags
  ibmvnic: Process crqs after enabling interrupts
  ibmvnic: don't stop queue in xmit
  udp6: allow SO_MARK ctrl msg to affect routing
  selftests/bpf: Fix fclose/pclose mismatch in test_progs
  crypto: pcrypt - Delay write to padata->info
  net: phylink: avoid mvneta warning when setting pause parameters
  net: amd-xgbe: Toggle PLL settings during rate change
  drm/amdgpu/gmc6: fix DMA mask from 44 to 40 bits
  wcn36xx: add proper DMA memory barriers in rx path
  libertas: Fix possible memory leak in probe and disconnect
  libertas_tf: Fix possible memory leak in probe and disconnect
  KVM: s390: Fix handle_sske page fault handling
  samples/kretprobes: Fix return value if register_kretprobe() failed
  tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
  irq: mips: avoid nested irq_enter()
  s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
  libbpf: Fix BTF data layout checks and allow empty BTF
  smackfs: use netlbl_cfg_cipsov4_del() for deleting cipso_v4_doi
  drm/msm: Fix potential NULL dereference in DPU SSPP
  clocksource/drivers/timer-ti-dm: Select TIMER_OF
  PM: hibernate: fix sparse warnings
  nvme-rdma: fix error code in nvme_rdma_setup_ctrl
  phy: micrel: ksz8041nl: do not use power down mode
  mwifiex: Send DELBA requests according to spec
  rsi: stop thread firstly in rsi_91x_init() error handling
  mt76: mt76x02: fix endianness warnings in mt76x02_mac.c
  platform/x86: thinkpad_acpi: Fix bitwise vs. logical warning
  block: ataflop: fix breakage introduced at blk-mq refactoring
  mmc: mxs-mmc: disable regulator on error and in the remove function
  net: stream: don't purge sk_error_queue in sk_stream_kill_queues()
  drm/msm: uninitialized variable in msm_gem_import()
  ath10k: fix max antenna gain unit
  hwmon: (pmbus/lm25066) Let compiler determine outer dimension of lm25066_coeff
  hwmon: Fix possible memleak in __hwmon_device_register()
  net, neigh: Fix NTF_EXT_LEARNED in combination with NTF_USE
  memstick: jmb38x_ms: use appropriate free function in jmb38x_ms_alloc_host()
  memstick: avoid out-of-range warning
  mmc: sdhci-omap: Fix NULL pointer exception if regulator is not configured
  b43: fix a lower bounds test
  b43legacy: fix a lower bounds test
  hwrng: mtk - Force runtime pm ops for sleep ops
  crypto: qat - disregard spurious PFVF interrupts
  crypto: qat - detect PFVF collision after ACK
  media: dvb-frontends: mn88443x: Handle errors of clk_prepare_enable()
  netfilter: nft_dynset: relax superfluous check on set updates
  EDAC/amd64: Handle three rank interleaving mode
  ath9k: Fix potential interrupt storm on queue reset
  media: em28xx: Don't use ops->suspend if it is NULL
  cpuidle: Fix kobject memory leaks in error paths
  crypto: ecc - fix CRYPTO_DEFAULT_RNG dependency
  kprobes: Do not use local variable when creating debugfs file
  media: cx23885: Fix snd_card_free call on null card pointer
  media: tm6000: Avoid card name truncation
  media: si470x: Avoid card name truncation
  media: radio-wl1273: Avoid card name truncation
  media: mtk-vpu: Fix a resource leak in the error handling path of 'mtk_vpu_probe()'
  media: TDA1997x: handle short reads of hdmi info frame.
  media: dvb-usb: fix ununit-value in az6027_rc_query
  media: cxd2880-spi: Fix a null pointer dereference on error handling path
  media: em28xx: add missing em28xx_close_extension
  drm/amdgpu: fix warning for overflow check
  ath10k: Fix missing frame timestamp for beacon/probe-resp
  net: dsa: rtl8366rb: Fix off-by-one bug
  rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()
  crypto: caam - disable pkc for non-E SoCs
  Bluetooth: btmtkuart: fix a memleak in mtk_hci_wmt_sync
  wilc1000: fix possible memory leak in cfg_scan_result()
  cgroup: Make rebind_subsystems() disable v2 controllers all at once
  net: net_namespace: Fix undefined member in key_remove_domain()
  virtio-gpu: fix possible memory allocation failure
  drm/v3d: fix wait for TMU write combiner flush
  rcu: Fix existing exp request check in sync_sched_exp_online_cleanup()
  Bluetooth: fix init and cleanup of sco_conn.timeout_work
  selftests/bpf: Fix strobemeta selftest regression
  netfilter: conntrack: set on IPS_ASSURED if flows enters internal stream state
  parisc/kgdb: add kgdb_roundup() to make kgdb work with idle polling
  parisc/unwind: fix unwinder when CONFIG_64BIT is enabled
  task_stack: Fix end_of_stack() for architectures with upwards-growing stack
  parisc: fix warning in flush_tlb_all
  x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
  spi: bcm-qspi: Fix missing clk_disable_unprepare() on error in bcm_qspi_probe()
  btrfs: do not take the uuid_mutex in btrfs_rm_device
  net: annotate data-race in neigh_output()
  vrf: run conntrack only in context of lower/physdev for locally generated packets
  ARM: 9136/1: ARMv7-M uses BE-8, not BE-32
  gre/sit: Don't generate link-local addr if addr_gen_mode is IN6_ADDR_GEN_MODE_NONE
  ARM: clang: Do not rely on lr register for stacktrace
  smackfs: use __GFP_NOFAIL for smk_cipso_doi()
  iwlwifi: mvm: disable RX-diversity in powersave
  selftests: kvm: fix mismatched fclose() after popen()
  PM: hibernate: Get block device exclusively in swsusp_check()
  nvme: drop scan_lock and always kick requeue list when removing namespaces
  nvmet-tcp: fix use-after-free when a port is removed
  nvmet: fix use-after-free when a port is removed
  block: remove inaccurate requeue check
  mwl8k: Fix use-after-free in mwl8k_fw_state_machine()
  tracing/cfi: Fix cmp_entries_* functions signature mismatch
  workqueue: make sysfs of unbound kworker cpumask more clever
  lib/xz: Validate the value before assigning it to an enum variable
  lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
  memstick: r592: Fix a UAF bug when removing the driver
  leaking_addresses: Always print a trailing newline
  ACPI: battery: Accept charges over the design capacity as full
  iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
  ath: dfs_pattern_detector: Fix possible null-pointer dereference in channel_detector_create()
  tracefs: Have tracefs directories not set OTH permission bits by default
  net-sysfs: try not to restart the syscall if it will fail eventually
  media: usb: dvd-usb: fix uninit-value bug in dibusb_read_eeprom_byte()
  media: ipu3-imgu: VIDIOC_QUERYCAP: Fix bus_info
  media: ipu3-imgu: imgu_fmt: Handle properly try
  ACPICA: Avoid evaluating methods too early during system resume
  ipmi: Disable some operations during a panic
  media: rcar-csi2: Add checking to rcsi2_start_receiver()
  brcmfmac: Add DMI nvram filename quirk for Cyberbook T116 tablet
  ia64: don't do IA64_CMPXCHG_DEBUG without CONFIG_PRINTK
  media: mceusb: return without resubmitting URB in case of -EPROTO error.
  media: imx: set a media_device bus_info string
  media: s5p-mfc: Add checking to s5p_mfc_probe().
  media: s5p-mfc: fix possible null-pointer dereference in s5p_mfc_probe()
  media: uvcvideo: Set unique vdev name based in type
  media: uvcvideo: Return -EIO for control errors
  media: uvcvideo: Set capability in s_param
  media: stm32: Potential NULL pointer dereference in dcmi_irq_thread()
  media: netup_unidvb: handle interrupt properly according to the firmware
  media: mt9p031: Fix corrupted frame after restarting stream
  ath10k: high latency fixes for beacon buffer
  mwifiex: Properly initialize private structure on interface type changes
  mwifiex: Run SET_BSS_MODE when changing from P2P to STATION vif-type
  x86: Increase exception stack sizes
  smackfs: Fix use-after-free in netlbl_catmap_walk()
  net: sched: update default qdisc visibility after Tx queue cnt changes
  locking/lockdep: Avoid RCU-induced noinstr fail
  MIPS: lantiq: dma: reset correct number of channel
  MIPS: lantiq: dma: add small delay after reset
  platform/x86: wmi: do not fail if disabling fails
  drm/panel-orientation-quirks: add Valve Steam Deck
  Bluetooth: fix use-after-free error in lock_sock_nested()
  Bluetooth: sco: Fix lock_sock() blockage by memcpy_from_msg()
  drm: panel-orientation-quirks: Add quirk for the Samsung Galaxy Book 10.6
  drm: panel-orientation-quirks: Add quirk for KD Kurio Smart C15200 2-in-1
  drm: panel-orientation-quirks: Update the Lenovo Ideapad D330 quirk (v2)
  dma-buf: WARN on dmabuf release with pending attachments
  USB: chipidea: fix interrupt deadlock
  USB: iowarrior: fix control-message timeouts
  USB: serial: keyspan: fix memleak on probe errors
  iio: dac: ad5446: Fix ad5622_write() return value
  pinctrl: core: fix possible memory leak in pinctrl_enable()
  quota: correct error number in free_dqentry()
  quota: check block number when reading the block in quota file
  PCI: aardvark: Read all 16-bits from PCIE_MSI_PAYLOAD_REG
  PCI: aardvark: Fix return value of MSI domain .alloc() method
  PCI: aardvark: Fix reporting Data Link Layer Link Active
  PCI: aardvark: Do not unmask unused interrupts
  PCI: aardvark: Fix checking for link up via LTSSM state
  PCI: aardvark: Do not clear status bits of masked interrupts
  PCI: pci-bridge-emul: Fix emulation of W1C bits
  xen/balloon: add late_initcall_sync() for initial ballooning done
  ALSA: mixer: fix deadlock in snd_mixer_oss_set_volume
  ALSA: mixer: oss: Fix racy access to slots
  serial: core: Fix initializing and restoring termios speed
  powerpc/85xx: Fix oops when mpc85xx_smp_guts_ids node cannot be found
  can: j1939: j1939_can_recv(): ignore messages with invalid source address
  can: j1939: j1939_tp_cmd_recv(): ignore abort message in the BAM transport
  KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use
  power: supply: max17042_battery: use VFSOC for capacity when no rsns
  power: supply: max17042_battery: Prevent int underflow in set_soc_threshold
  signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
  signal: Remove the bogus sigkill_pending in ptrace_stop
  RDMA/qedr: Fix NULL deref for query_qp on the GSI QP
  rsi: Fix module dev_oper_mode parameter description
  rsi: fix rate mask set leading to P2P failure
  rsi: fix key enabled check causing unwanted encryption for vap_id > 0
  rsi: fix occasional initialisation failure with BT coex
  wcn36xx: handle connection loss indication
  libata: fix checking of DMA state
  mwifiex: Read a PCI register after writing the TX ring write pointer
  wcn36xx: Fix HT40 capability for 2Ghz band
  evm: mark evm_fixmode as __ro_after_init
  rtl8187: fix control-message timeouts
  PCI: Mark Atheros QCA6174 to avoid bus reset
  ath10k: fix division by zero in send path
  ath10k: fix control-message timeout
  ath6kl: fix control-message timeout
  ath6kl: fix division by zero in send path
  mwifiex: fix division by zero in fw download path
  EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/Haswell
  regulator: dt-bindings: samsung,s5m8767: correct s5m8767,pmic-buck-default-dvs-idx property
  regulator: s5m8767: do not use reset value as DVS voltage if GPIO DVS is disabled
  hwmon: (pmbus/lm25066) Add offset coefficients
  ia64: kprobes: Fix to pass correct trampoline address to the handler
  btrfs: call btrfs_check_rw_degradable only if there is a missing device
  btrfs: fix lost error handling when replaying directory deletes
  btrfs: clear MISSING device status bit in btrfs_close_one_device
  net/smc: Correct spelling mistake to TCPF_SYN_RECV
  nfp: bpf: relax prog rejection for mtu check through max_pkt_offset
  vmxnet3: do not stop tx queues after netif_device_detach()
  r8169: Add device 10ec:8162 to driver r8169
  nvmet-tcp: fix header digest verification
  drm: panel-orientation-quirks: Add quirk for GPD Win3
  watchdog: Fix OMAP watchdog early handling
  net: multicast: calculate csum of looped-back and forwarded packets
  spi: spl022: fix Microwire full duplex mode
  nvmet-tcp: fix a memory leak when releasing a queue
  xen/netfront: stop tx queues during live migration
  bpf: Prevent increasing bpf_jit_limit above max
  bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
  drm: panel-orientation-quirks: Add quirk for Aya Neo 2021
  mmc: winbond: don't build on M68K
  reset: socfpga: add empty driver allowing consumers to probe
  ARM: dts: sun7i: A20-olinuxino-lime2: Fix ethernet phy-mode
  hyperv/vmbus: include linux/bitops.h
  sfc: Don't use netif_info before net_device setup
  cavium: Fix return values of the probe function
  scsi: qla2xxx: Fix unmap of already freed sgl
  scsi: qla2xxx: Return -ENOMEM if kzalloc() fails
  cavium: Return negative value when pci_alloc_irq_vectors() fails
  x86/irq: Ensure PI wakeup handler is unregistered before module unload
  x86/cpu: Fix migration safety with X86_BUG_NULL_SEL
  x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c
  fuse: fix page stealing
  ALSA: timer: Unconditionally unlink slave instances, too
  ALSA: timer: Fix use-after-free problem
  ALSA: synth: missing check for possible NULL after the call to kstrdup
  ALSA: usb-audio: Add registration quirk for JBL Quantum 400
  ALSA: line6: fix control and interrupt message timeouts
  ALSA: 6fire: fix control and bulk message timeouts
  ALSA: ua101: fix division by zero at probe
  ALSA: hda/realtek: Add quirk for HP EliteBook 840 G7 mute LED
  ALSA: hda/realtek: Add quirk for ASUS UX550VE
  ALSA: hda/realtek: Add a quirk for Acer Spin SP513-54N
  ALSA: hda/realtek: Add quirk for Clevo PC70HS
  media: v4l2-ioctl: Fix check_ext_ctrls
  media: ir-kbd-i2c: improve responsiveness of hauppauge zilog receivers
  media: ite-cir: IR receiver stop working after receive overflow
  crypto: s5p-sss - Add error handling in s5p_aes_probe()
  firmware/psci: fix application of sizeof to pointer
  tpm: Check for integer overflow in tpm2_map_response_body()
  parisc: Fix ptrace check on syscall return
  mmc: dw_mmc: Dont wait for DRTO on Write RSP error
  scsi: qla2xxx: Fix use after free in eh_abort path
  scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file
  ocfs2: fix data corruption on truncate
  libata: fix read log timeout value
  Input: i8042 - Add quirk for Fujitsu Lifebook T725
  Input: elantench - fix misreporting trackpoint coordinates
  Input: iforce - fix control-message timeout
  binder: use cred instead of task for getsecid
  binder: use cred instead of task for selinux checks
  binder: use euid from cred instead of using task
  usb: xhci: Enable runtime-pm by default on AMD Yellow Carp platform
  xhci: Fix USB 3.1 enumeration issues by increasing roothub power-on-good delay
  Linux 5.4.159
  rsi: fix control-message timeout
  media: staging/intel-ipu3: css: Fix wrong size comparison imgu_css_fw_init
  staging: rtl8192u: fix control-message timeouts
  staging: r8712u: fix control-message timeout
  comedi: vmk80xx: fix bulk and interrupt message timeouts
  comedi: vmk80xx: fix bulk-buffer overflow
  comedi: vmk80xx: fix transfer-buffer overflows
  comedi: ni_usb6501: fix NULL-deref in command paths
  comedi: dt9812: fix DMA buffers on stack
  isofs: Fix out of bound access for corrupted isofs image
  printk/console: Allow to disable console output by using console="" or console=null
  binder: don't detect sender/target during buffer cleanup
  usb-storage: Add compatibility quirk flags for iODD 2531/2541
  usb: musb: Balance list entry in musb_gadget_queue
  usb: gadget: Mark USB_FSL_QE broken on 64-bit
  usb: ehci: handshake CMD_RUN instead of STS_HALT
  Revert "x86/kvm: fix vcpu-id indexed array sizes"
  Linux 5.4.158
  ARM: 9120/1: Revert "amba: make use of -1 IRQs warn"
  Revert "drm/ttm: fix memleak in ttm_transfered_destroy"
  sfc: Fix reading non-legacy supported link modes
  Revert "usb: core: hcd: Add support for deferring roothub registration"
  Revert "xhci: Set HCD flag to defer primary roothub registration"
  media: firewire: firedtv-avc: fix a buffer overflow in avc_ca_pmt()
  net: ethernet: microchip: lan743x: Fix skb allocation failure
  vrf: Revert "Reset skb conntrack connection..."
  scsi: core: Put LLD module refcnt after SCSI device is released
  Linux 5.4.157
  perf script: Check session->header.env.arch before using it
  KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu
  KVM: s390: clear kicked_mask before sleeping again
  cfg80211: correct bridge/4addr mode check
  net: use netif_is_bridge_port() to check for IFF_BRIDGE_PORT
  sctp: add vtag check in sctp_sf_ootb
  sctp: add vtag check in sctp_sf_do_8_5_1_E_sa
  sctp: add vtag check in sctp_sf_violation
  sctp: fix the processing for COOKIE_ECHO chunk
  sctp: fix the processing for INIT_ACK chunk
  sctp: use init_tag from inithdr for ABORT chunk
  phy: phy_start_aneg: Add an unlocked version
  phy: phy_ethtool_ksettings_get: Lock the phy for consistency
  net/tls: Fix flipped sign in async_wait.err assignment
  net: nxp: lpc_eth.c: avoid hang when bringing interface down
  net: ethernet: microchip: lan743x: Fix dma allocation failure by using dma_set_mask_and_coherent
  net: ethernet: microchip: lan743x: Fix driver crash when lan743x_pm_resume fails
  nios2: Make NIOS2_DTB_SOURCE_BOOL depend on !COMPILE_TEST
  RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a string
  net: Prevent infinite while loop in skb_tx_hash()
  net: batman-adv: fix error handling
  regmap: Fix possible double-free in regcache_rbtree_exit()
  arm64: dts: allwinner: h5: NanoPI Neo 2: Fix ethernet node
  RDMA/mlx5: Set user priority for DCT
  nvme-tcp: fix data digest pointer calculation
  nvmet-tcp: fix data digest pointer calculation
  IB/hfi1: Fix abba locking issue with sc_disable()
  IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields
  tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
  drm/ttm: fix memleak in ttm_transfered_destroy
  net: lan78xx: fix division by zero in send path
  cfg80211: scan: fix RCU in cfg80211_add_nontrans_list()
  mmc: sdhci-esdhc-imx: clear the buffer_read_ready to reset standard tuning circuit
  mmc: sdhci: Map more voltage level to SDHCI_POWER_330
  mmc: dw_mmc: exynos: fix the finding clock sample value
  mmc: cqhci: clear HALT state after CQE enable
  mmc: vub300: fix control-message timeouts
  net/tls: Fix flipped sign in tls_err_abort() calls
  Revert "net: mdiobus: Fix memory leak in __mdiobus_register"
  nfc: port100: fix using -ERRNO as command type mask
  ata: sata_mv: Fix the error handling of mv_chip_id()
  Revert "pinctrl: bcm: ns: support updated DT binding as syscon subnode"
  usbnet: fix error return code in usbnet_probe()
  usbnet: sanity check for maxpacket
  ipv4: use siphash instead of Jenkins in fnhe_hashfun()
  ipv6: use siphash in rt6_exception_hash()
  powerpc/bpf: Fix BPF_MOD when imm == 1
  ARM: 9141/1: only warn about XIP address when not compile testing
  ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype
  ARM: 9134/1: remove duplicate memcpy() definition
  ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned
  Linux 5.4.156
  pinctrl: stm32: use valid pin identifier in stm32_pinctrl_resume()
  ARM: 9122/1: select HAVE_FUTEX_CMPXCHG
  tracing: Have all levels of checks prevent recursion
  net: mdiobus: Fix memory leak in __mdiobus_register
  scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma()
  Input: snvs_pwrkey - add clk handling
  ALSA: hda: avoid write to STATESTS if controller is in reset
  platform/x86: intel_scu_ipc: Update timeout value in comment
  isdn: mISDN: Fix sleeping function called from invalid context
  ARM: dts: spear3xx: Fix gmac node
  net: stmmac: add support for dwmac 3.40a
  btrfs: deal with errors when checking if a dir entry exists during log replay
  gcc-plugins/structleak: add makefile var for disabling structleak
  selftests: netfilter: remove stray bash debug line
  netfilter: Kconfig: use 'default y' instead of 'm' for bool config option
  isdn: cpai: check ctr->cnr to avoid array index out of bound
  nfc: nci: fix the UAF of rf_conn_info object
  mm, slub: fix potential memoryleak in kmem_cache_open()
  mm, slub: fix mismatch between reconstructed freelist depth and cnt
  powerpc/idle: Don't corrupt back chain when going idle
  KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
  KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
  powerpc64/idle: Fix SP offsets when saving GPRs
  audit: fix possible null-pointer dereference in audit_filter_rules
  ASoC: DAPM: Fix missing kctl change notifications
  ALSA: hda/realtek: Add quirk for Clevo PC50HS
  ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset
  vfs: check fd has read access in kernel_read_file_from_fd()
  elfcore: correct reference to CONFIG_UML
  ocfs2: mount fails with buffer overflow in strlen
  ocfs2: fix data corruption after conversion from inline format
  ceph: fix handling of "meta" errors
  can: j1939: j1939_xtp_rx_rts_session_new(): abort TP less than 9 bytes
  can: j1939: j1939_xtp_rx_dat_one(): cancel session if receive TP.DT with error length
  can: j1939: j1939_netdev_start(): fix UAF for rx_kref of j1939_priv
  can: j1939: j1939_tp_rxtimer(): fix errant alert in j1939_tp_rxtimer
  can: peak_pci: peak_pci_remove(): fix UAF
  can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification
  can: rcar_can: fix suspend/resume
  net: enetc: fix ethtool counter name for PM0_TERR
  net: stmmac: Fix E2E delay mechanism
  net: hns3: disable sriov before unload hclge layer
  net: hns3: add limit ets dwrr bandwidth cannot be 0
  net: hns3: reset DWRR of unused tc to zero
  NIOS2: irqflags: rename a redefined register name
  net: dsa: lantiq_gswip: fix register definition
  lan78xx: select CRC32
  netfilter: ipvs: make global sysctl readonly in non-init netns
  ASoC: wm8960: Fix clock configuration on slave mode
  dma-debug: fix sg checks in debug_dma_map_sg()
  NFSD: Keep existing listeners on portlist error
  xtensa: xtfpga: Try software restart before simulating CPU reset
  xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
  ARM: dts: at91: sama5d2_som1_ek: disable ISC node by default
  tee: optee: Fix missing devices unregister during optee_remove
  net: switchdev: do not propagate bridge updates across bridges
  parisc: math-emu: Fix fall-through warnings
  Linux 5.4.155
  ionic: don't remove netdev->dev_addr when syncing uc list
  r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
  qed: Fix missing error code in qed_slowpath_start()
  mqprio: Correct stats in mqprio_dump_class_stats().
  acpi/arm64: fix next_platform_timer() section mismatch error
  drm/msm/dsi: fix off by one in dsi_bus_clk_enable error handling
  drm/msm/dsi: Fix an error code in msm_dsi_modeset_init()
  drm/msm: Fix null pointer dereference on pointer edp
  drm/panel: olimex-lcd-olinuxino: select CRC32
  platform/mellanox: mlxreg-io: Fix argument base in kstrtou32() call
  mlxsw: thermal: Fix out-of-bounds memory accesses
  ata: ahci_platform: fix null-ptr-deref in ahci_platform_enable_regulators()
  pata_legacy: fix a couple uninitialized variable bugs
  NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
  NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
  nfc: fix error handling of nfc_proto_register()
  ethernet: s2io: fix setting mac address during resume
  net: encx24j600: check error in devm_regmap_init_encx24j600
  net: stmmac: fix get_hw_feature() on old hardware
  net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
  net: korina: select CRC32
  net: arc: select CRC32
  gpio: pca953x: Improve bias setting
  sctp: account stream padding length for reconf chunk
  iio: dac: ti-dac5571: fix an error code in probe()
  iio: ssp_sensors: fix error code in ssp_print_mcu_debug()
  iio: ssp_sensors: add more range checking in ssp_parse_dataframe()
  iio: light: opt3001: Fixed timeout error when 0 lux
  iio: mtk-auxadc: fix case IIO_CHAN_INFO_PROCESSED
  iio: adc128s052: Fix the error handling path of 'adc128_probe()'
  iio: adc: aspeed: set driver data when adc probe.
  powerpc/xive: Discard disabled interrupts in get_irqchip_state()
  x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically
  nvmem: Fix shift-out-of-bound (UBSAN) with byte size cells
  EDAC/armada-xp: Fix output of uncorrectable error counter
  virtio: write back F_VERSION_1 before validate
  USB: serial: option: add prod. id for Quectel EG91
  USB: serial: option: add Telit LE910Cx composition 0x1204
  USB: serial: option: add Quectel EC200S-CN module support
  USB: serial: qcserial: add EM9191 QDL support
  Input: xpad - add support for another USB ID of Nacon GC-100
  usb: musb: dsps: Fix the probe error path
  efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
  efi/cper: use stack buffer for error record decoding
  cb710: avoid NULL pointer subtraction
  xhci: Enable trust tx length quirk for Fresco FL11 USB controller
  xhci: Fix command ring pointer corruption while aborting a command
  xhci: guard accesses to ep_state in xhci_endpoint_reset()
  mei: me: add Ice Lake-N device id.
  x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
  watchdog: orion: use 0 for unset heartbeat
  btrfs: check for error when looking up inode during dir entry replay
  btrfs: deal with errors when adding inode reference during log replay
  btrfs: deal with errors when replaying dir entry during log replay
  btrfs: unlock newly allocated extent buffer after error
  csky: Fixup regs.sr broken in ptrace
  csky: don't let sigreturn play with priveleged bits of status register
  s390: fix strrchr() implementation
  nds32/ftrace: Fix Error: invalid operands (*UND* and *UND* sections) for `^'
  ALSA: hda/realtek: Fix the mic type detection issue for ASUS G551JW
  ALSA: hda/realtek - ALC236 headset MIC recording issue
  ALSA: hda/realtek: Add quirk for Clevo X170KM-G
  ALSA: hda/realtek: Complete partial device name to avoid ambiguity
  ALSA: seq: Fix a potential UAF by wrong private_free call order
  ALSA: usb-audio: Add quirk for VF0770
  ovl: simplify file splice
  Linux 5.4.154
  sched: Always inline is_percpu_thread()
  scsi: virtio_scsi: Fix spelling mistake "Unsupport" -> "Unsupported"
  scsi: ses: Fix unsigned comparison with less than zero
  drm/amdgpu: fix gart.bo pin_count leak
  net: sun: SUNVNET_COMMON should depend on INET
  mac80211: check return value of rhashtable_init
  net: prevent user from passing illegal stab size
  m68k: Handle arrivals of multiple signals correctly
  mac80211: Drop frames from invalid MAC address in ad-hoc mode
  netfilter: nf_nat_masquerade: defer conntrack walk to work queue
  netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic
  HID: wacom: Add new Intuos BT (CTL-4100WL/CTL-6100WL) device IDs
  netfilter: ip6_tables: zero-initialize fragment offset
  HID: apple: Fix logical maximum and usage maximum of Magic Keyboard JIS
  ext4: correct the error path of ext4_write_inline_data_end()
  net: phy: bcm7xxx: Fixed indirect MMD operations
  UPSTREAM: ovl: simplify file splice
  Linux 5.4.153
  x86/Kconfig: Correct reference to MWINCHIP3D
  x86/hpet: Use another crystalball to evaluate HPET usability
  x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
  RISC-V: Include clone3() on rv32
  bpf, s390: Fix potential memory leak about jit_data
  i2c: acpi: fix resource leak in reconfiguration device addition
  net: prefer socket bound to interface when not in VRF
  i40e: Fix freeing of uninitialized misc IRQ vector
  i40e: fix endless loop under rtnl
  gve: fix gve_get_stats()
  rtnetlink: fix if_nlmsg_stats_size() under estimation
  gve: Correct available tx qpl check
  drm/nouveau/debugfs: fix file release memory leak
  video: fbdev: gbefb: Only instantiate device when built for IP32
  bus: ti-sysc: Use CLKDM_NOAUTO for dra7 dcan1 for errata i893
  netlink: annotate data races around nlk->bound
  net: sfp: Fix typo in state machine debug string
  net/sched: sch_taprio: properly cancel timer from taprio_destroy()
  net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size()
  ARM: imx6: disable the GIC CPU interface before calling stby-poweroff sequence
  arm64: dts: ls1028a: add missing CAN nodes
  arm64: dts: freescale: Fix SP805 clock-names
  ptp_pch: Load module automatically if ID matches
  powerpc/fsl/dts: Fix phy-connection-type for fm1mac3
  net_sched: fix NULL deref in fifo_set_limit()
  phy: mdio: fix memory leak
  bpf: Fix integer overflow in prealloc_elems_and_freelist()
  bpf, arm: Fix register clobbering in div/mod implementation
  xtensa: call irqchip_init only when CONFIG_USE_OF is selected
  xtensa: use CONFIG_USE_OF instead of CONFIG_OF
  xtensa: move XCHAL_KIO_* definitions to kmem_layout.h
  arm64: dts: qcom: pm8150: use qcom,pm8998-pon binding
  ARM: dts: imx: Fix USB host power regulator polarity on M53Menlo
  ARM: dts: imx: Add missing pinctrl-names for panel on M53Menlo
  soc: qcom: mdt_loader: Drop PT_LOAD check on hash segment
  ARM: dts: qcom: apq8064: Use 27MHz PXO clock as DSI PLL reference
  soc: qcom: socinfo: Fixed argument passed to platform_set_data()
  bpf, mips: Validate conditional branch offsets
  MIPS: BPF: Restore MIPS32 cBPF JIT
  ARM: dts: qcom: apq8064: use compatible which contains chipid
  ARM: dts: omap3430-sdp: Fix NAND device node
  xen/balloon: fix cancelled balloon action
  nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
  nfsd: fix error handling of register_pernet_subsys() in init_nfsd()
  ovl: fix missing negative dentry check in ovl_rename()
  mmc: meson-gx: do not use memcpy_to/fromio for dram-access-quirk
  xen/privcmd: fix error handling in mmap-resource processing
  usb: typec: tcpm: handle SRC_STARTUP state if cc changes
  USB: cdc-acm: fix break reporting
  USB: cdc-acm: fix racy tty buffer accesses
  Partially revert "usb: Kconfig: using select for USB_COMMON dependency"
  ANDROID: Different fix for KABI breakage in 5.4.151 in struct sock
  Linux 5.4.152
  libata: Add ATA_HORKAGE_NO_NCQ_ON_ATI for Samsung 860 and 870 SSD.
  silence nfscache allocation warnings with kvzalloc
  perf/x86: Reset destroy callback on event init failure
  kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]
  KVM: do not shrink halt_poll_ns below grow_start
  tools/vm/page-types: remove dependency on opt_file for idle page tracking
  scsi: ses: Retry failed Send/Receive Diagnostic commands
  selftests:kvm: fix get_warnings_count() ignoring fscanf() return warn
  selftests: be sure to make khdr before other targets
  usb: dwc2: check return value after calling platform_get_resource()
  usb: testusb: Fix for showing the connection speed
  scsi: sd: Free scsi_disk device via put_device()
  ext2: fix sleeping in atomic bugs on error
  sparc64: fix pci_iounmap() when CONFIG_PCI is not set
  xen-netback: correct success/error reporting for the SKB-with-fraglist case
  net: mdio: introduce a shutdown method to mdio device drivers
  ANDROID: Fix up KABI breakage in 5.4.151 in struct sock
  Linux 5.4.151
  HID: usbhid: free raw_report buffers in usbhid_stop
  netfilter: ipset: Fix oversized kvmalloc() calls
  HID: betop: fix slab-out-of-bounds Write in betop_probe
  crypto: ccp - fix resource leaks in ccp_run_aes_gcm_cmd()
  usb: hso: remove the bailout parameter
  usb: hso: fix error handling code of hso_create_net_device
  hso: fix bailout in error case of probe
  libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
  PCI: Fix pci_host_bridge struct device release/free handling
  net: stmmac: don't attach interface until resume finishes
  net: udp: annotate data race around udp_sk(sk)->corkflag
  HID: u2fzero: ignore incomplete packets without data
  ext4: fix potential infinite loop in ext4_dx_readdir()
  ext4: fix reserved space counter leakage
  ext4: fix loff_t overflow in ext4_max_bitmap_size()
  ipack: ipoctal: fix module reference leak
  ipack: ipoctal: fix missing allocation-failure check
  ipack: ipoctal: fix tty-registration error handling
  ipack: ipoctal: fix tty registration race
  ipack: ipoctal: fix stack information leak
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error
  elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings
  perf/x86/intel: Update event constraints for ICX
  af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
  net: sched: flower: protect fl_walk() with rcu
  net: hns3: do not allow call hns3_nic_net_open repeatedly
  scsi: csiostor: Add module softdep on cxgb4
  Revert "block, bfq: honor already-setup queue merges"
  selftests, bpf: test_lwt_ip_encap: Really disable rp_filter
  e100: fix buffer overrun in e100_get_regs
  e100: fix length calculation in e100_get_regs_len
  net: ipv4: Fix rtnexthop len when RTA_FLOW is present
  hwmon: (tmp421) fix rounding for negative values
  hwmon: (tmp421) report /PVLD condition as fault
  sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
  mac80211-hwsim: fix late beacon hrtimer handling
  mac80211: mesh: fix potentially unaligned access
  mac80211: limit injected vht mcs/nss in ieee80211_parse_tx_radiotap
  mac80211: Fix ieee80211_amsdu_aggregate frag_tail bug
  hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
  ipvs: check that ip_vs_conn_tab_bits is between 8 and 20
  drm/amd/display: Pass PCI deviceid into DC
  x86/kvmclock: Move this_cpu_pvti into kvmclock.h
  mac80211: fix use-after-free in CCMP/GCMP RX
  scsi: ufs: Fix illegal offset in UPIU event trace
  hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
  hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
  fs-verity: fix signed integer overflow with i_size near S64_MAX
  usb: cdns3: fix race condition before setting doorbell
  cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
  cpufreq: schedutil: Use kobject release() method to free sugov_tunables
  tty: Fix out-of-bound vmalloc access in imageblit
  Revert "crypto: public_key: fix overflow during implicit conversion"
  Linux 5.4.150
  qnx4: work around gcc false positive warning bug
  xen/balloon: fix balloon kthread freezing
  arm64: dts: marvell: armada-37xx: Extend PCIe MEM space
  thermal/drivers/int340x: Do not set a wrong tcc offset on resume
  EDAC/synopsys: Fix wrong value type assignment for edac_mode
  spi: Fix tegra20 build with CONFIG_PM=n
  net: 6pack: Fix tx timeout and slot time
  alpha: Declare virt_to_phys and virt_to_bus parameter as pointer to volatile
  arm64: Mark __stack_chk_guard as __ro_after_init
  parisc: Use absolute_pointer() to define PAGE0
  qnx4: avoid stringop-overread errors
  sparc: avoid stringop-overread errors
  net: i825xx: Use absolute_pointer for memcpy from fixed memory location
  compiler.h: Introduce absolute_pointer macro
  blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
  sparc32: page align size in arch_dma_alloc
  nvme-multipath: fix ANA state updates when a namespace is not present
  xen/balloon: use a kernel thread instead a workqueue
  bpf: Add oversize check before call kvcalloc()
  ipv6: delay fib6_sernum increase in fib6_add
  m68k: Double cast io functions to unsigned long
  net: stmmac: allow CSR clock of 300MHz
  net: macb: fix use after free on rmmod
  blktrace: Fix uaf in blk_trace access after removing by sysfs
  md: fix a lock order reversal in md_alloc
  irqchip/gic-v3-its: Fix potential VPE leak on error
  irqchip/goldfish-pic: Select GENERIC_IRQ_CHIP to fix build
  scsi: lpfc: Use correct scnprintf() limit
  scsi: qla2xxx: Restore initiator in dual mode
  cifs: fix a sign extension bug
  thermal/core: Potential buffer overflow in thermal_build_list_of_policies()
  fpga: machxo2-spi: Fix missing error code in machxo2_write_complete()
  fpga: machxo2-spi: Return an error on failure
  tty: synclink_gt: rename a conflicting function name
  tty: synclink_gt, drop unneeded forward declarations
  scsi: iscsi: Adjust iface sysfs attr detection
  net/mlx4_en: Don't allow aRFS for encapsulated packets
  qed: rdma - don't wait for resources under hw error recovery flow
  gpio: uniphier: Fix void functions to remove return value
  net/smc: add missing error check in smc_clc_prfx_set()
  bnxt_en: Fix TX timeout when TX ring size is set to the smallest
  enetc: Fix illegal access when reading affinity_hint
  platform/x86/intel: punit_ipc: Drop wrong use of ACPI_PTR()
  afs: Fix incorrect triggering of sillyrename on 3rd-party invalidation
  net: hso: fix muxed tty registration
  serial: mvebu-uart: fix driver's tx_empty callback
  xhci: Set HCD flag to defer primary roothub registration
  btrfs: prevent __btrfs_dump_space_info() to underflow its free space
  erofs: fix up erofs_lookup tracepoint
  mcb: fix error handling in mcb_alloc_bus()
  USB: serial: option: add device id for Foxconn T99W265
  USB: serial: option: remove duplicate USB device ID
  USB: serial: option: add Telit LN920 compositions
  USB: serial: mos7840: remove duplicated 0xac24 device ID
  usb: core: hcd: Add support for deferring roothub registration
  Re-enable UAS for LaCie Rugged USB3-FW with fk quirk
  staging: greybus: uart: fix tty use after free
  binder: make sure fd closes complete
  USB: cdc-acm: fix minor-number release
  USB: serial: cp210x: add ID for GW Instek GDM-834x Digital Multimeter
  usb-storage: Add quirk for ScanLogic SL11R-IDE older than 2.6c
  xen/x86: fix PV trap handling on secondary processors
  cifs: fix incorrect check for null pointer in header_assemble
  usb: musb: tusb6010: uninitialized data in tusb_fifo_write_unaligned()
  usb: dwc2: gadget: Fix ISOC transfer complete handling for DDMA
  usb: dwc2: gadget: Fix ISOC flow for BDMA and Slave
  usb: gadget: r8a66597: fix a loop in set_feature()
  ocfs2: drop acl cache for directories too
  Linux 5.4.149
  drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV
  rtc: rx8010: select REGMAP_I2C
  blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
  pwm: stm32-lp: Don't modify HW state in .remove() callback
  pwm: rockchip: Don't modify HW state in .remove() callback
  pwm: img: Don't modify HW state in .remove() callback
  nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
  nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
  nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
  nilfs2: fix NULL pointer in nilfs_##name##_attr_release
  nilfs2: fix memory leak in nilfs_sysfs_create_device_group
  btrfs: fix lockdep warning while mounting sprout fs
  ceph: lockdep annotations for try_nonblocking_invalidate
  ceph: request Fw caps before updating the mtime in ceph_write_iter
  dmaengine: xilinx_dma: Set DMA mask for coherent APIs
  dmaengine: ioat: depends on !UML
  dmaengine: sprd: Add missing MODULE_DEVICE_TABLE
  parisc: Move pci_dev_is_behind_card_dino to where it is used
  drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()
  thermal/core: Fix thermal_cooling_device_register() prototype
  Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
  net: stmmac: reset Tx desc base address before restarting Tx
  phy: avoid unnecessary link-up delay in polling mode
  pwm: lpc32xx: Don't modify HW state in .probe() after the PWM chip was registered
  profiling: fix shift-out-of-bounds bugs
  nilfs2: use refcount_dec_and_lock() to fix potential UAF
  prctl: allow to setup brk for et_dyn executables
  9p/trans_virtio: Remove sysfs file on probe failure
  thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
  dmaengine: acpi: Avoid comparison GSI with Linux vIRQ
  um: virtio_uml: fix memory leak on init failures
  staging: rtl8192u: Fix bitwise vs logical operator in TranslateRxSignalStuff819xUsb()
  sctp: add param size validation for SCTP_PARAM_SET_PRIMARY
  sctp: validate chunk size in __rcv_asconf_lookup
  ARM: 9098/1: ftrace: MODULE_PLT: Fix build problem without DYNAMIC_FTRACE
  ARM: 9079/1: ftrace: Add MODULE_PLTS support
  ARM: 9078/1: Add warn suppress parameter to arm_gen_branch_link()
  ARM: 9077/1: PLT: Move struct plt_entries definition to header
  apparmor: remove duplicate macro list_entry_is_head()
  ARM: Qualify enabling of swiotlb_init()
  s390/pci_mmio: fully validate the VMA before calling follow_pte()
  console: consume APC, DM, DCS
  KVM: remember position in kvm->vcpus array
  PCI/ACPI: Add Ampere Altra SOC MCFG quirk
  PCI: aardvark: Fix reporting CRS value
  PCI: pci-bridge-emul: Add PCIe Root Capabilities Register
  PCI: aardvark: Indicate error in 'val' when config read fails
  PCI: pci-bridge-emul: Fix big-endian support
  Linux 5.4.148
  s390/bpf: Fix 64-bit subtraction of the -0x80000000 constant
  s390/bpf: Fix optimizing out zero-extensions
  net: renesas: sh_eth: Fix freeing wrong tx descriptor
  ip_gre: validate csum_start only on pull
  qlcnic: Remove redundant unlock in qlcnic_pinit_from_rom
  fq_codel: reject silly quantum parameters
  netfilter: socket: icmp6: fix use-after-scope
  net: dsa: b53: Fix calculating number of switch ports
  perf unwind: Do not overwrite FEATURE_CHECK_LDFLAGS-libunwind-{x86,aarch64}
  ARC: export clear_user_page() for modules
  mtd: rawnand: cafe: Fix a resource leak in the error handling path of 'cafe_nand_probe()'
  PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n
  KVM: arm64: Handle PSCI resets before userspace touches vCPU state
  mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set
  PCI: Fix pci_dev_str_match_path() alloc while atomic bug
  mfd: axp20x: Update AXP288 volatile ranges
  NTB: perf: Fix an error code in perf_setup_inbuf()
  NTB: Fix an error code in ntb_msit_probe()
  ethtool: Fix an error code in cxgb2.c
  PCI: ibmphp: Fix double unmap of io_mem
  block, bfq: honor already-setup queue merges
  net: usb: cdc_mbim: avoid altsetting toggling for Telit LN920
  Set fc_nlinfo in nh_create_ipv4, nh_create_ipv6
  PCI: Add ACS quirks for Cavium multi-function devices
  tracing/probes: Reject events which have the same name of existing one
  mfd: Don't use irq_create_mapping() to resolve a mapping
  fuse: fix use after free in fuse_read_interrupt()
  PCI: Add ACS quirks for NXP LX2xx0 and LX2xx2 platforms
  mfd: db8500-prcmu: Adjust map to reality
  dt-bindings: mtd: gpmc: Fix the ECC bytes vs. OOB bytes equation
  mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
  net: hns3: fix the timing issue of VF clearing interrupt sources
  net: hns3: disable mac in flr process
  net: hns3: change affinity_mask to numa node range
  net: hns3: pad the short tunnel frame before sending to hardware
  KVM: PPC: Book3S HV: Tolerate treclaim. in fake-suspend mode changing registers
  ibmvnic: check failover_pending in login response
  dt-bindings: arm: Fix Toradex compatible typo
  qed: Handle management FW error
  tcp: fix tp->undo_retrans accounting in tcp_sacktag_one()
  net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
  net/af_unix: fix a data-race in unix_dgram_poll
  vhost_net: fix OoB on sendmsg() failure.
  events: Reuse value read using READ_ONCE instead of re-reading it
  net/mlx5: Fix potential sleeping in atomic context
  net/mlx5: FWTrace, cancel work on alloc pd error flow
  perf machine: Initialize srcline string member in add_location struct
  tipc: increase timeout in tipc_sk_enqueue()
  r6040: Restore MDIO clock frequency after MAC reset
  net/l2tp: Fix reference count leak in l2tp_udp_recv_core
  dccp: don't duplicate ccid when cloning dccp sock
  ptp: dp83640: don't define PAGE0
  net-caif: avoid user-triggerable WARN_ON(1)
  tipc: fix an use-after-free issue in tipc_recvmsg
  x86/mm: Fix kern_addr_valid() to cope with existing but not present entries
  s390/sclp: fix Secure-IPL facility detection
  drm/etnaviv: add missing MMU context put when reaping MMU mapping
  drm/etnaviv: reference MMU context when setting up hardware state
  drm/etnaviv: fix MMU context leak on GPU reset
  drm/etnaviv: exec and MMU state is lost when resetting the GPU
  drm/etnaviv: keep MMU context across runtime suspend/resume
  drm/etnaviv: stop abusing mmu_context as FE running marker
  drm/etnaviv: put submit prev MMU context when it exists
  drm/etnaviv: return context from etnaviv_iommu_context_get
  drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10
  PCI: Add AMD GPU multi-function power dependencies
  PM: base: power: don't try to use non-existing RTC for storing data
  arm64/sve: Use correct size when reinitialising SVE state
  bnx2x: Fix enabling network interfaces without VFs
  xen: reset legacy rtc flag for PV domU
  btrfs: fix upper limit for max_inline for page size 64K
  drm/panfrost: Clamp lock region to Bifrost minimum
  drm/panfrost: Use u64 for size in lock_region
  drm/panfrost: Simplify lock_region calculation
  drm/amdgpu: Fix BUG_ON assert
  drm/msi/mdp4: populate priv->kms in mdp4_kms_init
  net: dsa: lantiq_gswip: fix maximum frame length
  lib/test_stackinit: Fix static initializer test
  platform/chrome: cros_ec_proto: Send command again when timeout occurs
  memcg: enable accounting for pids in nested pid namespaces
  mm,vmscan: fix divide by zero in get_scan_count
  mm/hugetlb: initialize hugetlb_usage in mm_init
  s390/pv: fix the forcing of the swiotlb
  cpufreq: powernv: Fix init_chip_info initialization in numa=off
  scsi: qla2xxx: Sync queue idx with queue_pair_map idx
  scsi: qla2xxx: Changes to support kdump kernel
  scsi: BusLogic: Fix missing pr_cont() use
  ovl: fix BUG_ON() in may_delete() when called from ovl_cleanup()
  parisc: fix crash with signals and alloca
  net: w5100: check return value after calling platform_get_resource()
  fix array-index-out-of-bounds in taprio_change
  net: fix NULL pointer reference in cipso_v4_doi_free
  ath9k: fix sleeping in atomic context
  ath9k: fix OOB read ar9300_eeprom_restore_internal
  parport: remove non-zero check on count
  net/mlx5: DR, Enable QP retransmission
  iwlwifi: mvm: fix access to BSS elements
  iwlwifi: mvm: avoid static queue number aliasing
  iwlwifi: mvm: fix a memory leak in iwl_mvm_mac_ctxt_beacon_changed
  drm/amdkfd: Account for SH/SE count when setting up cu masks.
  ASoC: rockchip: i2s: Fixup config for DAIFMT_DSP_A/B
  ASoC: rockchip: i2s: Fix regmap_ops hang
  usbip:vhci_hcd USB port can get stuck in the disabled state
  usbip: give back URBs for unsent unlink requests during cleanup
  usb: musb: musb_dsps: request_irq() after initializing musb
  Revert "USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set"
  cifs: fix wrong release in sess_alloc_buffer() failed path
  mmc: core: Return correct emmc response in case of ioctl error
  selftests/bpf: Enlarge select() timeout for test_maps
  mmc: rtsx_pci: Fix long reads when clock is prescaled
  mmc: sdhci-of-arasan: Check return value of non-void funtions
  of: Don't allow __of_attached_node_sysfs() without CONFIG_SYSFS
  ASoC: Intel: Skylake: Fix passing loadable flag for module
  ASoC: Intel: Skylake: Fix module configuration for KPB and MIXER
  btrfs: tree-log: check btrfs_lookup_data_extent return value
  m68knommu: only set CONFIG_ISA_DMA_API for ColdFire sub-arch
  drm/exynos: Always initialize mapping in exynos_drm_register_dma()
  lockd: lockd server-side shouldn't set fl_ops
  usb: chipidea: host: fix port index underflow and UBSAN complains
  gfs2: Don't call dlm after protocol is unmounted
  staging: rts5208: Fix get_ms_information() heap buffer size
  rpc: fix gss_svc_init cleanup on failure
  tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
  serial: sh-sci: fix break handling for sysrq
  opp: Don't print an error if required-opps is missing
  Bluetooth: Fix handling of LE Enhanced Connection Complete
  nvme-tcp: don't check blk_mq_tag_to_rq when receiving pdu data
  arm64: dts: ls1046a: fix eeprom entries
  arm64: tegra: Fix compatible string for Tegra132 CPUs
  ARM: tegra: tamonten: Fix UART pad setting
  mac80211: Fix monitor MTU limit so that A-MSDUs get through
  drm/display: fix possible null-pointer dereference in dcn10_set_clock()
  gpu: drm: amd: amdgpu: amdgpu_i2c: fix possible uninitialized-variable access in amdgpu_i2c_router_select_ddc_port()
  net/mlx5: Fix variable type to match 64bit
  Bluetooth: avoid circular locks in sco_sock_connect
  Bluetooth: schedule SCO timeouts with delayed_work
  selftests/bpf: Fix xdp_tx.c prog section name
  drm/msm: mdp4: drop vblank get/put from prepare/complete_commit
  net: ethernet: stmmac: Do not use unreachable() in ipq806x_gmac_probe()
  arm64: dts: qcom: sdm660: use reg value for memory node
  ARM: dts: imx53-ppd: Fix ACHC entry
  media: tegra-cec: Handle errors of clk_prepare_enable()
  media: TDA1997x: fix tda1997x_query_dv_timings() return value
  media: v4l2-dv-timings.c: fix wrong condition in two for-loops
  media: imx258: Limit the max analogue gain to 480
  media: imx258: Rectify mismatch of VTS value
  ASoC: Intel: bytcr_rt5640: Move "Platform Clock" routes to the maps for the matching in-/output
  arm64: tegra: Fix Tegra194 PCIe EP compatible string
  bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
  workqueue: Fix possible memory leaks in wq_numa_init()
  Bluetooth: skip invalid hci_sync_conn_complete_evt
  ata: sata_dwc_460ex: No need to call phy_exit() befre phy_init()
  samples: bpf: Fix tracex7 error raised on the missing argument
  staging: ks7010: Fix the initialization of the 'sleep_status' structure
  serial: 8250_pci: make setup_port() parameters explicitly unsigned
  hvsi: don't panic on tty_register_driver failure
  xtensa: ISS: don't panic in rs_init
  serial: 8250: Define RX trigger levels for OxSemi 950 devices
  s390: make PCI mio support a machine flag
  s390/jump_label: print real address in a case of a jump label bug
  flow_dissector: Fix out-of-bounds warnings
  ipv4: ip_output.c: Fix out-of-bounds warning in ip_copy_addrs()
  video: fbdev: riva: Error out if 'pixclock' equals zero
  video: fbdev: kyro: Error out if 'pixclock' equals zero
  video: fbdev: asiliantfb: Error out if 'pixclock' equals zero
  bpf/tests: Do not PASS tests without actually testing the result
  bpf/tests: Fix copy-and-paste error in double word test
  drm/amd/amdgpu: Update debugfs link_settings output link_rate field in hex
  drm/amd/display: Fix timer_per_pixel unit error
  tty: serial: jsm: hold port lock when reporting modem line changes
  staging: board: Fix uninitialized spinlock when attaching genpd
  usb: gadget: composite: Allow bMaxPower=0 if self-powered
  USB: EHCI: ehci-mv: improve error handling in mv_ehci_enable()
  usb: gadget: u_ether: fix a potential null pointer dereference
  usb: host: fotg210: fix the actual_length of an iso packet
  usb: host: fotg210: fix the endpoint's transactional opportunities calculation
  igc: Check if num of q_vectors is smaller than max before array access
  drm: avoid blocking in drm_clients_info's rcu section
  Smack: Fix wrong semantics in smk_access_entry()
  netlink: Deal with ESRCH error in nlmsg_notify()
  video: fbdev: kyro: fix a DoS bug by restricting user input
  ARM: dts: qcom: apq8064: correct clock names
  iavf: fix locking of critical sections
  iavf: do not override the adapter state in the watchdog task
  iio: dac: ad5624r: Fix incorrect handling of an optional regulator.
  tipc: keep the skb in rcv queue until the whole data is read
  PCI: Use pci_update_current_state() in pci_enable_device_flags()
  crypto: mxs-dcp - Use sg_mapping_iter to copy data
  media: dib8000: rewrite the init prbs logic
  ASoC: atmel: ATMEL drivers don't need HAS_DMA
  drm/amdgpu: Fix amdgpu_ras_eeprom_init()
  userfaultfd: prevent concurrent API initialization
  kbuild: Fix 'no symbols' warning when CONFIG_TRIM_UNUSD_KSYMS=y
  MIPS: Malta: fix alignment of the devicetree buffer
  f2fs: fix to unmap pages from userspace process in punch_hole()
  f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
  f2fs: fix to account missing .skipped_gc_rwsem
  KVM: PPC: Fix clearing never mapped TCEs in realmode
  clk: at91: clk-generated: Limit the requested rate to our range
  clk: at91: clk-generated: pass the id of changeable parent at registration
  clk: at91: sam9x60: Don't use audio PLL
  fscache: Fix cookie key hashing
  platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
  KVM: PPC: Book3S HV Nested: Reflect guest PMU in-use to L0 when guest SPRs are live
  HID: i2c-hid: Fix Elan touchpad regression
  scsi: target: avoid per-loop XCOPY buffer allocations
  powerpc/config: Renable MTD_PHYSMAP_OF
  scsi: qedf: Fix error codes in qedf_alloc_global_queues()
  scsi: qedi: Fix error codes in qedi_alloc_global_queues()
  scsi: smartpqi: Fix an error code in pqi_get_raid_map()
  pinctrl: single: Fix error return code in pcs_parse_bits_in_pinctrl_entry()
  scsi: fdomain: Fix error return code in fdomain_probe()
  SUNRPC: Fix potential memory corruption
  dma-debug: fix debugfs initialization order
  openrisc: don't printk() unconditionally
  f2fs: reduce the scope of setting fsck tag when de->name_len is zero
  f2fs: show f2fs instance in printk_ratelimited
  RDMA/efa: Remove double QP type assignment
  powerpc/stacktrace: Include linux/delay.h
  vfio: Use config not menuconfig for VFIO_NOIOMMU
  pinctrl: samsung: Fix pinctrl bank pin count
  docs: Fix infiniband uverbs minor number
  RDMA/iwcm: Release resources if iw_cm module initialization fails
  IB/hfi1: Adjust pkey entry in index 0
  scsi: bsg: Remove support for SCSI_IOCTL_SEND_COMMAND
  f2fs: quota: fix potential deadlock
  HID: input: do not report stylus battery state as "full"
  PCI: aardvark: Fix masking and unmasking legacy INTx interrupts
  PCI: aardvark: Increase polling delay to 1.5s while waiting for PIO response
  PCI: aardvark: Fix checking for PIO status
  PCI: xilinx-nwl: Enable the clock through CCF
  PCI: Return ~0 data on pciconfig_read() CAP_SYS_ADMIN failure
  PCI: Restrict ASMedia ASM1062 SATA Max Payload Size Supported
  PCI/portdrv: Enable Bandwidth Notification only if port supports it
  ARM: 9105/1: atags_to_fdt: don't warn about stack size
  libata: add ATA_HORKAGE_NO_NCQ_TRIM for Samsung 860 and 870 SSDs
  dmaengine: imx-sdma: remove duplicated sdma_load_context
  Revert "dmaengine: imx-sdma: refine to load context only once"
  media: rc-loopback: return number of emitters rather than error
  media: uvc: don't do DMA on stack
  VMCI: fix NULL pointer dereference when unmapping queue pair
  dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc()
  power: supply: max17042: handle fails of reading status register
  block: bfq: fix bfq_set_next_ioprio_data()
  crypto: public_key: fix overflow during implicit conversion
  arm64: head: avoid over-mapping in map_memory
  soc: aspeed: p2a-ctrl: Fix boundary check for mmap
  soc: aspeed: lpc-ctrl: Fix boundary check for mmap
  soc: qcom: aoss: Fix the out of bound usage of cooling_devs
  pinctrl: ingenic: Fix incorrect pull up/down info
  pinctrl: stmfx: Fix hazardous u8[] to unsigned long cast
  tools/thermal/tmon: Add cross compiling support
  9p/xen: Fix end of loop tests for list_for_each_entry
  include/linux/list.h: add a macro to test if entry is pointing to the head
  xen: fix setting of max_pfn in shared_info
  powerpc/perf/hv-gpci: Fix counter value parsing
  PCI/MSI: Skip masking MSI-X on Xen PV
  blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN
  blk-zoned: allow zone management send operations without CAP_SYS_ADMIN
  btrfs: reset replace target device to allocation state on close
  btrfs: wake up async_delalloc_pages waiters after submit
  rtc: tps65910: Correct driver module alias

 Conflicts:
	Documentation/devicetree/bindings
	Documentation/devicetree/bindings/arm/tegra.yaml
	Documentation/devicetree/bindings/mtd/gpmc-nand.txt
	Documentation/devicetree/bindings/regulator/samsung,s5m8767.txt
	kernel/sched/cpufreq_schedutil.c

Change-Id: Id17c4366cdc6854cd23fba0f41d335b09fc6100e
Signed-off-by: Srinivasarao Pathipati <quic_spathi@quicinc.com>
2022-02-07 22:29:21 +05:30
David Hildenbrand
8656566821 mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()
commit 7cf209ba8a86410939a24cb1aeb279479a7e0ca6 upstream.

Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"

These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.

These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.

[1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com

This patch (of 4):

Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.

As we will search for a fitting zone using the wrong pfn, we might end
up onlining memory to one of the special kernel zones, such as ZONE_DMA,
which can end badly as the onlined memory does not satisfy properties of
these zones.

Use "unsigned long" instead, just as we do in other places when handling
PFNs.  This can bite us once we have physical addresses in the range of
multiple TB.

Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com
Fixes: e5e6893026 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-22 12:26:43 +02:00
Blagovest Kolenichev
a728307820 Merge android-5.4.9 (813bf83) into msm-5.4
* refs/heads/tmp-813bf83:
  ANDROID: update abi for previous revert
  Revert "BACKPORT: perf_event: Add support for LSM and SELinux checks"
  Linux 5.4.9
  mm/hugetlb: defer freeing of huge pages if in non-task context
  hsr: fix a race condition in node list insertion and deletion
  hsr: fix error handling routine in hsr_dev_finalize()
  hsr: avoid debugfs warning message when module is remove
  net: annotate lockless accesses to sk->sk_pacing_shift
  perf/x86/intel/bts: Fix the use of page_private()
  efi: Don't attempt to map RCI2 config table if it doesn't exist
  lib/ubsan: don't serialize UBSAN report
  xen/blkback: Avoid unmapping unmapped grant pages
  mm/sparse.c: mark populate_section_memmap as __meminit
  s390/smp: fix physical to logical CPU map for SMT
  Btrfs: only associate the locked page with one async_chunk struct
  btrfs: get rid of unique workqueue helper functions
  ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
  net: add annotations on hh->hh_len lockless accesses
  xfs: periodically yield scrub threads to the scheduler
  drm/i915/execlists: Fix annotation for decoupling virtual request
  ath9k_htc: Discard undersized packets
  ath9k_htc: Modify byte order for an error message
  fix compat handling of FICLONERANGE, FIDEDUPERANGE and FS_IOC_FIEMAP
  fs: cifs: Fix atime update check vs mtime
  cifs: Fix lookup of root ses in DFS referral cache
  tty: serial: msm_serial: Fix lockup for sysrq and oops
  phy: renesas: rcar-gen3-usb2: Use platform_get_irq_optional() for optional irq
  arm64: dts: meson: odroid-c2: Disable usb_otg bus to avoid power failed warning
  dt-bindings: clock: renesas: rcar-usb2-clock-sel: Fix typo in example
  media: usb: fix memory leak in af9005_identify_state
  regulator: ab8500: Remove AB8505 USB regulator
  media: flexcop-usb: ensure -EIO is returned on error condition
  arm64: dts: meson-gxm-khadas-vim2: fix uart_A bluetooth node
  arm64: dts: meson-gxl-s905x-khadas-vim: fix uart_A bluetooth node
  Bluetooth: Fix memory leak in hci_connect_le_scan
  Bluetooth: delete a stray unlock
  Bluetooth: btusb: fix PM leak in error case of setup
  powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace
  regulator: axp20x: Fix AXP22x ELDO2 regulator enable bitmask
  spi: uniphier: Fix FIFO threshold
  regulator: bd70528: Remove .set_ramp_delay for bd70528_ldo_ops
  regulator: axp20x: Fix axp20x_set_ramp_delay
  watchdog: tqmx86_wdt: Fix build error
  net, sysctl: Fix compiler warning when only cBPF is present
  netfilter: nf_queue: enqueue skbs with NULL dst
  platform/x86: pmc_atom: Add Siemens CONNECT X300 to critclk_systems DMI table
  xfs: don't check for AG deadlock for realtime files in bunmapi
  firmware: arm_scmi: Avoid double free in error flow
  cifs: Fix potential softlockups while refreshing DFS cache
  of: overlay: add_changeset_property() memory leak
  iommu/vt-d: Remove incorrect PSI capability check
  perf callchain: Fix segfault in thread__resolve_callchain_sample()
  ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
  kernel/module.c: wakeup processes in module_wq on module unload
  net/sched: annotate lockless accesses to qdisc->empty
  HID: i2c-hid: Reset ALPS touchpads on resume
  powerpc: Chunk calls to flush_dcache_range in arch_*_memory
  nfsd4: fix up replay_matches_cache()
  arm64: dts: qcom: msm8998-clamshell: Remove retention idle state
  sunrpc: fix crash when cache_head become valid before update
  PM / devfreq: Check NULL governor in available_governors_show
  drm/msm: include linux/sched/task.h
  spi: spi-fsl-dspi: Fix 16-bit word order in 32-bit XSPI mode
  ftrace: Avoid potential division by zero in function profiler
  arm64: Revert support for execute-only user mappings
  exit: panic before exit_mm() on global init exit
  scsi: lpfc: Fix rpi release when deleting vport
  ALSA: firewire-motu: Correct a typo in the clock proc string
  ALSA: pcm: Yet another missing check of non-cached buffer type
  ALSA: cs4236: fix error return comparison of an unsigned integer
  gen_initramfs_list.sh: fix 'bad variable name' error
  dmaengine: virt-dma: Fix access after free in vchan_complete()
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  mm/gup: fix memory leak in __gup_benchmark_ioctl
  io_uring: use current task creds instead of allocating a new one
  samples/trace_printk: Wait for IRQ work to finish
  tracing: Fix endianness bug in histogram trigger
  tracing: Have the histogram compare functions convert to u64 first
  tracing: Avoid memory leak in process_system_preds()
  tracing: Fix lock inversion in trace_event_enable_tgid_record()
  rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30
  riscv: ftrace: correct the condition logic in function graph tracer
  clocksource: riscv: add notrace to riscv_sched_clock
  gpiolib: fix up emulated open drain outputs
  gpio: xtensa: fix driver build
  libata: Fix retrieving of active qcs
  ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE
  ata: ahci_brcm: Add missing clock management during recovery
  ata: ahci_brcm: Fix AHCI resources management
  ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()
  bpf: Fix precision tracking for unbounded scalars
  compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES
  compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE
  compat_ioctl: block: handle Persistent Reservations
  Btrfs: fix infinite loop during nocow writeback due to race
  dmaengine: dma-jz4780: Also break descriptor chains on JZ4725B
  dmaengine: Fix access to uninitialized dma_slave_caps
  selftests/seccomp: Catch garbage on SECCOMP_IOCTL_NOTIF_RECV
  samples/seccomp: Zero out members based on seccomp_notif_sizes
  seccomp: Check that seccomp_notif is zeroed out by the user
  selftests/seccomp: Zero out seccomp_notif
  locks: print unsigned ino in /proc/locks
  gcc-plugins: make it possible to disable CONFIG_GCC_PLUGINS again
  pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
  pstore/ram: Write new dumps to start of recycled zones
  ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
  mm/oom: fix pgtables units mismatch in Killed process message
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  mm/zsmalloc.c: fix the migrated zspage statistics.
  mm/memory_hotplug: shrink zones when offlining memory
  media: cec: check 'transmit_in_progress', not 'transmitting'
  media: cec: avoid decrementing transmit_queue_sz if it is 0
  media: cec: CEC 2.0-only bcast messages were ignored
  media: pulse8-cec: fix lost cec_transmit_attempt_done() call
  MIPS: Avoid VDSO ABI breakage due to global register variable
  MIPS: BPF: eBPF JIT: check for MIPS ISA compliance in Kconfig
  MIPS: BPF: Disable MIPS32 eBPF JIT
  drm/amdgpu/smu: add metrics table lock for vega20 (v2)
  drm/amdgpu/smu: add metrics table lock for navi (v2)
  drm/amdgpu/smu: add metrics table lock for arcturus (v2)
  drm/amdgpu/smu: add metrics table lock
  drm/sun4i: hdmi: Remove duplicate cleanup calls
  ALSA: hda/realtek - Add headset Mic no shutup for ALC283
  ALSA: hda - Apply sync-write workaround to old Intel platforms, too
  ALSA: usb-audio: set the interface format after resume on Dell WD19
  ALSA: usb-audio: fix set_format altsetting sanity check
  ALSA: ice1724: Fix sleep-in-atomic in Infrasonic Quartet support code
  mm: drop mmap_sem before calling balance_dirty_pages() in write fault
  block: add bio_truncate to fix guard_bio_eod
  netfilter: nft_tproxy: Fix port selector on Big Endian
  ALSA: hda - Downgrade error message for single-cmd fallback
  taskstats: fix data-race
  shmem: pin the file in shmem_fault() if mmap_sem is dropped
  tcp: fix data-race in tcp_recvmsg()
  ALSA: hda - fixup for the bass speaker on Lenovo Carbon X1 7th gen
  PCI: Fix missing inline for pci_pr3_present()
  ALSA: hda: Allow HDA to be runtime suspended when dGPU is not bound to a driver
  PCI: Add a helper to check Power Resource Requirements _PR3 existence
  ALSA: hda/realtek - Enable the bass speaker of ASUS UX431FLC
  ALSA: hda/realtek - Add Bass Speaker and fixed dac for bass speaker
  PM / hibernate: memory_bm_find_bit(): Tighten node optimisation
  xen/balloon: fix ballooned page accounting without hotplug enabled
  xen-blkback: prevent premature module unload
  IB/mlx5: Fix steering rule of drop and count
  IB/mlx4: Follow mirror sequence of device add during device removal
  RDMA/counter: Prevent auto-binding a QP which are not tracked with res
  s390/cpum_sf: Avoid SBD overflow condition in irq handler
  s390/cpum_sf: Adjust sampling interval to avoid hitting sample limits
  md: raid1: check rdev before reference in raid1_sync_request func
  raid5: need to set STRIPE_HANDLE for batch head
  afs: Fix creation calls in the dynamic root to fail with EOPNOTSUPP
  afs: Fix mountpoint parsing
  net: make socket read/write_iter() honor IOCB_NOWAIT
  usb: gadget: fix wrong endpoint desc
  drm/nouveau/kms/nv50-: fix panel scaling
  drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware
  drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
  staging/wlan-ng: add CRC32 dependency in Kconfig
  scsi: iscsi: Avoid potential deadlock in iscsi_if_rx func
  scsi: libsas: stop discovering if oob mode is disconnected
  scsi: iscsi: qla4xxx: fix double free in probe
  scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
  scsi: qla2xxx: Don't defer relogin unconditonally
  scsi: qla2xxx: Send Notify ACK after N2N PLOGI
  scsi: qla2xxx: Configure local loop for N2N target
  scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
  scsi: qla2xxx: Don't call qlt_async_event twice
  scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
  scsi: qla2xxx: Use explicit LOGO in target mode
  scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set func
  rxe: correctly calculate iCRC for unaligned payloads
  RDMA/cma: add missed unregister_pernet_subsys in init failure
  afs: Fix SELinux setting security label on /afs
  afs: Fix afs_find_server lookups for ipv4 peers
  PM / devfreq: Don't fail devfreq_dev_release if not in list
  PM / devfreq: Set scaling_max_freq to max on OPP notifier error
  PM / devfreq: Fix devfreq_notifier_call returning errno
  iio: adc: max9611: Fix too short conversion time delay
  iio: st_accel: Fix unused variable warning
  nvme/pci: Fix read queue count
  nvme/pci: Fix write and poll queue types
  drm/amd/display: update dispclk and dppclk vco frequency
  drm/amd/display: Reset steer fifo before unblanking the stream
  drm/amd/display: Change the delay time before enabling FEC
  drm/amd/display: Fixed kernel panic when booting with DP-to-HDMI dongle
  drm/amd/display: Map DSC resources 1-to-1 if numbers of OPPs and DSCs are equal
  drm/amdgpu: add cache flush workaround to gfx8 emit_fence
  drm/amdgpu: add header line for power profile on Arcturus
  drm/amdgpu: add check before enabling/disabling broadcast mode
  nvme-fc: fix double-free scenarios on hw queues
  nvme_fc: add module to ops template to allow module references
  drm/mcde: dsi: Fix invalid pointer dereference if panel cannot be found
  ANDROID: update kernel ABI representation
  BACKPORT: perf_event: Add support for LSM and SELinux checks
  ANDROID: Update ABI representation
  ANDROID: GKI: clk: Don't disable unused clocks with sync state support
  ANDROID: GKI: clk: Add support for clock providers with sync state
  ANDROID: GKI: driver core: Add dev_has_sync_state()
  ANDROID: sdcardfs: fix -ENOENT lookup race issue
  CHROMIUM: cgroups: relax permissions on moving tasks between cgroups
  UPSTREAM: selinux: sidtab reverse lookup hash table
  ANDROID: update abi for 5.4.8 release

Conflicts:
	Documentation/devicetree/bindings
	Documentation/devicetree/bindings/clock/renesas,rcar-usb2-clock-sel.txt
	arch/arm64/mm/mmu.c
	include/linux/clk-provider.h

Change-Id: I668e3fd58b4ad5db037f700b66f89cdf845094b5
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2020-03-07 10:01:47 -08:00
David Hildenbrand
e84c5b7617 mm/memory_hotplug: shrink zones when offlining memory
commit feee6b2989165631b17ac6d4ccdbf6759254e85a upstream.

We currently try to shrink a single zone when removing memory.  We use
the zone of the first page of the memory we are removing.  If that
memmap was never initialized (e.g., memory was never onlined), we will
read garbage and can trigger kernel BUGs (due to a stale pointer):

    BUG: unable to handle page fault for address: 000000000000353d
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 0 P4D 0
    Oops: 0002 [#1] SMP PTI
    CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
    Workqueue: kacpi_hotplug acpi_hotplug_work_fn
    RIP: 0010:clear_zone_contiguous+0x5/0x10
    Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
    RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
    RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
    RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
    R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
    FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     __remove_pages+0x4b/0x640
     arch_remove_memory+0x63/0x8d
     try_remove_memory+0xdb/0x130
     __remove_memory+0xa/0x11
     acpi_memory_device_remove+0x70/0x100
     acpi_bus_trim+0x55/0x90
     acpi_device_hotplug+0x227/0x3a0
     acpi_hotplug_work_fn+0x1a/0x30
     process_one_work+0x221/0x550
     worker_thread+0x50/0x3b0
     kthread+0x105/0x140
     ret_from_fork+0x3a/0x50
    Modules linked in:
    CR2: 000000000000353d

Instead, shrink the zones when offlining memory or when onlining failed.
Introduce and use remove_pfn_range_from_zone(() for that.  We now
properly shrink the zones, even if we have DIMMs whereby

 - Some memory blocks fall into no zone (never onlined)

 - Some memory blocks fall into multiple zones (offlined+re-onlined)

 - Multiple memory blocks that fall into different zones

Drop the zone parameter (with a potential dubious value) from
__remove_pages() and __remove_section().

Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09 10:19:56 +01:00
Liam Mark
c5d41d1162 mm, oom: Try to online memory block before killing
Before killing a process first try to see if there are any offlined
memory blocks, if there are try to online one of them.

Change-Id: Ie97c8b69f0ba173c202a891a38a5c914869ddaae
Signed-off-by: Liam Mark <lmark@codeaurora.org>
[isaacm@codeaurora.org: Update walk_memory_range to walk_memory_blocks]
Signed-off-by: Isaac J. Manjarres <isaacm@codeaurora.org>
2020-01-02 22:15:25 -08:00
Dan Williams
ba72b4c8cf mm/sparsemem: support sub-section hotplug
The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity.

For example the following backtrace is emitted when attempting
arch_add_memory() with physical address ranges that intersect 'System
RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:

    # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
    100000000-1ffffffff : System RAM
    200000000-303ffffff : Persistent Memory (legacy)
    304000000-43fffffff : System RAM
    440000000-23ffffffff : Persistent Memory
    2400000000-43bfffffff : Persistent Memory
      2400000000-43bfffffff : namespace2.0

    WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
    [..]
    RIP: 0010:add_pages+0x5c/0x60
    [..]
    Call Trace:
     devm_memremap_pages+0x460/0x6e0
     pmem_attach_disk+0x29e/0x680 [nd_pmem]
     ? nd_dax_probe+0xfc/0x120 [libnvdimm]
     nvdimm_bus_probe+0x66/0x160 [libnvdimm]

It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while.

A fix for that long-standing breakage introduces as many problems as it
solves as it would require a backward-incompatible change to the
namespace metadata interpretation.  Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.

Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.

[1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com

[osalvador@suse.de: fix deactivate_section for early sections]
  Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
Dan Williams
7ea6216049 mm/sparsemem: prepare for sub-section ranges
Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().

This is simply plumbing, small cleanups, and some identifier renames.
No intended functional changes.

Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:07 -07:00
David Hildenbrand
ea8846411a mm/memory_hotplug: move and simplify walk_memory_blocks()
Let's move walk_memory_blocks() to the place where memory block logic
resides and simplify it.  While at it, add a type for the callback
function.

Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
David Hildenbrand
fbcf73ce65 mm/memory_hotplug: rename walk_memory_range() and pass start+size instead of pfns
walk_memory_range() was once used to iterate over sections.  Now, it
iterates over memory blocks.  Rename the function, fixup the
documentation.

Also, pass start+size instead of PFNs, which is what most callers
already have at hand.  (we'll rework link_mem_sections() most probably
soon)

Follow-up patches will rework, simplify, and move walk_memory_blocks()
to drivers/base/memory.c.

Note: walk_memory_blocks() only works correctly right now if the
start_pfn is aligned to a section start.  This is the case right now,
but we'll generalize the function in a follow up patch so the semantics
match the documentation.

[akpm@linux-foundation.org: remove unused variable]
Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
David Hildenbrand
b9bf8d342d mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section
The parameter is unused, so let's drop it.  Memory removal paths should
never care about zones.  This is the job of memory offlining and will
require more refactorings.

Link: http://lkml.kernel.org/r/20190527111152.16324-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
David Hildenbrand
05f800a0bd mm/memory_hotplug: drop MHP_MEMBLOCK_API
No longer needed, the callers of arch_add_memory() can handle this
manually.

Link: http://lkml.kernel.org/r/20190527111152.16324-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
David Hildenbrand
80ec922dbd mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE
We want to improve error handling while adding memory by allowing to use
arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:

	arch_add_memory()
	rc = do_something();
	if (rc) {
		arch_remove_memory();
	}

We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.

Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chintan Pandya <cpandya@codeaurora.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jun Yao <yaojun8558363@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-18 17:08:06 -07:00
Pavel Tatashin
eca499ab37 mm/hotplug: make remove_memory() interface usable
Presently the remove_memory() interface is inherently broken.  It tries
to remove memory but panics if some memory is not offline.  The problem
is that it is impossible to ensure that all memory blocks are offline as
this function also takes lock_device_hotplug that is required to change
memory state via sysfs.

So, between calling this function and offlining all memory blocks there
is always a window when lock_device_hotplug is released, and therefore,
there is always a chance for a panic during this window.

Make this interface to return an error if memory removal fails.  This
way it is safe to call this function without panicking machine, and also
makes it symmetric to add_memory() which already returns an error.

Link: http://lkml.kernel.org/r/20190517215438.6487-3-pasha.tatashin@soleen.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:24 -07:00
David Hildenbrand
ac5c942645 mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail
All callers of arch_remove_memory() ignore errors.  And we should really
try to remove any errors from the memory removal path.  No more errors are
reported from __remove_pages().  BUG() in s390x code in case
arch_remove_memory() is triggered.  We may implement that properly later.
WARN in case powerpc code failed to remove the section mapping, which is
better than ignoring the error completely right now.

Link: http://lkml.kernel.org/r/20190409100148.24703-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:50 -07:00
Michal Hocko
940519f0c8 mm, memory_hotplug: provide a more generic restrictions for memory hotplug
arch_add_memory, __add_pages take a want_memblock which controls whether
the newly added memory should get the sysfs memblock user API (e.g.
ZONE_DEVICE users do not want/need this interface).  Some callers even
want to control where do we allocate the memmap from by configuring
altmap.

Add a more generic hotplug context for arch_add_memory and __add_pages.
struct mhp_restrictions contains flags which contains additional features
to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
altmap for alternative memmap allocator.

This patch shouldn't introduce any functional change.

[akpm@linux-foundation.org: build fix]
Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:49 -07:00
Michal Hocko
5557c766ab mm, memory_hotplug: cleanup memory offline path
check_pages_isolated_cb currently accounts the whole pfn range as being
offlined if test_pages_isolated suceeds on the range.  This is based on
the assumption that all pages in the range are freed which is currently
the case in most cases but it won't be with later changes, as pages marked
as vmemmap won't be isolated.

Move the offlined pages counting to offline_isolated_pages_cb and rely on
__offline_isolated_pages to return the correct value.
check_pages_isolated_cb will still do it's primary job and check the pfn
range.

While we are at it remove check_pages_isolated and offline_isolated_pages
and use directly walk_system_ram_range as do in online_pages.

Link: http://lkml.kernel.org/r/20190408082633.2864-2-osalvador@suse.de
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:49 -07:00
Linus Torvalds
d14d7f14f1 Merge tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
 "xen fixes and features:

   - remove fallback code for very old Xen hypervisors

   - three patches for fixing Xen dom0 boot regressions

   - an old patch for Xen PCI passthrough which was never applied for
     unknown reasons

   - some more minor fixes and cleanup patches"

* tag 'for-linus-5.1a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix dom0 boot on huge systems
  xen, cpu_hotplug: Prevent an out of bounds access
  xen: remove pre-xen3 fallback handlers
  xen/ACPI: Switch to bitmap_zalloc()
  x86/xen: dont add memory above max allowed allocation
  x86: respect memory size limiting via mem= parameter
  xen/gntdev: Check and release imported dma-bufs on close
  xen/gntdev: Do not destroy context while dma-bufs are in use
  xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
  xen-scsiback: mark expected switch fall-through
  xen: mark expected switch fall-through
2019-03-11 17:08:14 -07:00
Arun KS
a9cd410a3d mm/page_alloc.c: memory hotplug: free pages as higher order
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

[arunks@codeaurora.org: v11]
  Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
  Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
  Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:14 -08:00
Juergen Gross
357b4da50a x86: respect memory size limiting via mem= parameter
When limiting memory size via kernel parameter "mem=" this should be
respected even in case of memory made accessible via a PCI card.

Today this kind of memory won't be made usable in initial memory
setup as the memory won't be visible in E820 map, but it might be
added when adding PCI devices due to corresponding ACPI table entries.

Not respecting "mem=" can be corrected by adding a global max_mem_size
variable set by parse_memopt() which will result in rejecting adding
memory areas resulting in a memory size above the allowed limit.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2019-02-18 06:50:34 +01:00
Qian Cai
b13bc35193 mm/hotplug: invalid PFNs from pfn_to_online_page()
On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP).  Hence, the page->flags is
uninitialized.

This is due to the commit 9f1eb38e0e ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid().  However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.

Historically, the commit 68709f4538 ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().

Since the commit 2d070eab2e ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.

Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98f5a ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.

[1]
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
  Mem abort info:
    ESR = 0x96000005
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000005
    CM = 0, WnR = 0
  Internal error: Oops: 96000005 [#1] SMP
  CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
  pstate: 60400009 (nZCv daif +PAN -UAO)
  pc : page_mapping+0x24/0x144
  lr : __dump_page+0x34/0x3dc
  sp : ffff00003a5cfd10
  x29: ffff00003a5cfd10 x28: 000000000000802f
  x27: 0000000000000000 x26: 0000000000277d00
  x25: ffff000010791f56 x24: ffff7fe000000000
  x23: ffff000010772f8b x22: ffff00001125f670
  x21: ffff000011311000 x20: ffff000010772f8b
  x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: ffff802698b19600
  x13: ffff802698b1a200 x12: ffff802698b16f00
  x11: ffff802698b1a400 x10: 0000000000001400
  x9 : 0000000000000001 x8 : ffff00001121a000
  x7 : 0000000000000000 x6 : ffff0000102c53b8
  x5 : 0000000000000000 x4 : 0000000000000003
  x3 : 0000000000000100 x2 : 0000000000000000
  x1 : ffff000010772f8b x0 : ffffffffffffffff
  Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
  Call trace:
   page_mapping+0x24/0x144
   __dump_page+0x34/0x3dc
   dump_page+0x28/0x4c
   kmemleak_scan+0x4ac/0x680
   kmemleak_scan_thread+0xb4/0xdc
   kthread+0x12c/0x13c
   ret_from_fork+0x10/0x18
  Code: d503201f f9400660 36000040 d1000413 (f9400661)
  ---[ end trace 4d4bd7f573490c8e ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x002,20000c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
Fixes: 9f1eb38e0e ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Wei Yang
0614ce9776 include/linux/memory_hotplug.h: remove duplicate declaration of offline_pages()
offline_pages() is already declared in this file.

Just remove the duplicated one.

Link: http://lkml.kernel.org/r/20181205031357.24769-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:50 -08:00
Wei Yang
4e0d2e7ef1 mm, sparse: pass nid instead of pgdat to sparse_add_one_section()
Since the information needed in sparse_add_one_section() is node id to
allocate proper memory, it is not necessary to pass its pgdat.

This patch changes the prototype of sparse_add_one_section() to pass node
id directly.  This is intended to reduce misleading that
sparse_add_one_section() would touch pgdat.

Link: http://lkml.kernel.org/r/20181204085657.20472-2-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:49 -08:00
Oscar Salvador
2c2a5af6fe mm, memory_hotplug: add nid parameter to arch_remove_memory
Patch series "Do not touch pages in hot-remove path", v2.

This patchset aims for two things:

 1) A better definition about offline and hot-remove stage
 2) Solving bugs where we can access non-initialized pages
    during hot-remove operations [2] [3].

This is achieved by moving all page/zone handling to the offline
stage, so we do not need to access pages when hot-removing memory.

[1] https://patchwork.kernel.org/cover/10691415/
[2] https://patchwork.kernel.org/patch/10547445/
[3] https://www.spinics.net/lists/linux-mm/msg161316.html

This patch (of 5):

This is a preparation for the following-up patches.  The idea of passing
the nid is that it will allow us to get rid of the zone parameter
afterwards.

Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:49 -08:00
David Hildenbrand
f29d8e9c01 mm/memory_hotplug: drop "online" parameter from add_memory_resource()
Userspace should always be in charge of how to online memory and if memory
should be onlined automatically in the kernel.  Let's drop the parameter
to overwrite this - XEN passes memhp_auto_online, just like add_memory(),
so we can directly use that instead internally.

Link: http://lkml.kernel.org/r/20181123123740.27652-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:48 -08:00
David Hildenbrand
8df1d0e4a2 mm/memory_hotplug: make add_memory() take the device_hotplug_lock
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3b ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:17 -07:00
David Hildenbrand
d15e59260f mm/memory_hotplug: make remove_memory() take the device_hotplug_lock
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.

Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock.  And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.

While e.g.
	echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
	echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.

E.g.  via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock.  So we can
have concurrent callers in online_pages().  We e.g.  touch in
online_pages() basically unprotected zone->present_pages then.

Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible.  We
would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.

Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.

I propose the general rules (documentation added in patch 6):

1. add_memory/add_memory_resource() must only be called with
   device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
   already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
   device_hotplug_lock. This is already documented and true for now in core
   code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
   online_pages/offline_pages.

To me, this looks way cleaner than what we have right now (and easier to
verify).  And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.

This patch (of 6):

remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported.  So let's provide a variant
that takes the lock and only export that one.

The lock is already held in
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
	arch/powerpc/platforms/powernv/memtrace.c

Apart from that, there are not other users in the tree.

Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:17 -07:00
Oscar Salvador
03e85f9d5f mm/page_alloc: Introduce free_area_init_core_hotplug
Currently, whenever a new node is created/re-used from the memhotplug
path, we call free_area_init_node()->free_area_init_core().  But there is
some code that we do not really need to run when we are coming from such
path.

free_area_init_core() performs the following actions:

1) Initializes pgdat internals, such as spinlock, waitqueues and more.
2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on
   when creating hash tables.
3) Account number of managed_pages per zone, substracting dma_reserved and
   memmap pages.
4) Initializes some fields of the zone structure data
5) Calls init_currently_empty_zone to initialize all the freelists
6) Calls memmap_init to initialize all pages belonging to certain zone

When called from memhotplug path, free_area_init_core() only performs
actions #1 and #4.

Action #2 is pointless as the zones do not have any pages since either the
node was freed, or we are re-using it, eitherway all zones belonging to
this node should have 0 pages.  For the same reason, action #3 results
always in manages_pages being 0.

Action #5 and #6 are performed later on when onlining the pages:
 online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone()
 online_pages()->move_pfn_range_to_zone()->memmap_init_zone()

This patch does two things:

First, moves the node/zone initializtion to their own function, so it
allows us to create a small version of free_area_init_core, where we only
perform:

1) Initialization of pgdat internals, such as spinlock, waitqueues and more
4) Initialization of some fields of the zone structure data

These two functions are: pgdat_init_internals() and zone_init_internals().

The second thing this patch does, is to introduce
free_area_init_core_hotplug(), the memhotplug version of
free_area_init_core():

Currently, we call free_area_init_node() from the memhotplug path.  In
there, we set some pgdat's fields, and call calculate_node_totalpages().
calculate_node_totalpages() calculates the # of pages the node has.

Since the node is either new, or we are re-using it, the zones belonging
to this node should not have any pages, so there is no point to calculate
this now.

Actually, we re-set these values to 0 later on with the calls to:

reset_node_managed_pages()
reset_node_present_pages()

The # of pages per node and the # of pages per zone will be calculated when
onlining the pages:

online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range()
online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range()

Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace
__paginginit with __init, so their code gets freed up.

[osalvador@techadventures.net: fix section usage]
  Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net
[osalvador@suse.de: v6]
  Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net
Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:45 -07:00
Mathieu Malaterre
fb52bbaee5 mm: move is_pageblock_removable_nolock() to mm/memory_hotplug.c
is_pageblock_removable_nolock() is not used outside of
mm/memory_hotplug.c.  Move it next to unique caller
is_mem_section_removable() and make it static.

Remove prototype in <linux/memory_hotplug.h> to silence gcc warning (W=1):

  mm/page_alloc.c:7704:6: warning: no previous prototype for `is_pageblock_removable_nolock' [-Wmissing-prototypes]

Link: http://lkml.kernel.org/r/20180509190001.24789-1-malat@debian.org
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:36 -07:00
Joonsoo Kim
d883c6cf3b Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
This reverts the following commits that change CMA design in MM.

 3d2054ad8c ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")

 1d47a3ec09 ("mm/cma: remove ALLOC_CMA")

 bad8c6c0b1 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

Ville reported a following error on i386.

  Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
  microcode: microcode updated early to revision 0x4, date = 2013-06-28
  Initializing CPU#0
  Initializing HighMem for node 0 (000377fe:00118000)
  Initializing Movable for node 0 (00000001:00118000)
  BUG: Bad page state in process swapper  pfn:377fe
  page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
  flags: 0x80000000()
  raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
  page dumped because: nonzero mapcount
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
  Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
  Call Trace:
   dump_stack+0x60/0x96
   bad_page+0x9a/0x100
   free_pages_check_bad+0x3f/0x60
   free_pcppages_bulk+0x29d/0x5b0
   free_unref_page_commit+0x84/0xb0
   free_unref_page+0x3e/0x70
   __free_pages+0x1d/0x20
   free_highmem_page+0x19/0x40
   add_highpages_with_active_regions+0xab/0xeb
   set_highmem_pages_init+0x66/0x73
   mem_init+0x1b/0x1d7
   start_kernel+0x17a/0x363
   i386_start_kernel+0x95/0x99
   startup_32_smp+0x164/0x168

The reason for this error is that the span of MOVABLE_ZONE is extended
to whole node span for future CMA initialization, and, normal memory is
wrongly freed here.  I submitted the fix and it seems to work, but,
another problem happened.

It's so late time to fix the later problem so I decide to reverting the
series.

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-24 10:07:50 -07:00
Joonsoo Kim
bad8c6c0b1 mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE
Patch series "mm/cma: manage the memory of the CMA area by using the
ZONE_MOVABLE", v2.

0. History

This patchset is the follow-up of the discussion about the "Introduce
ZONE_CMA (v7)" [1].  Please reference it if more information is needed.

1. What does this patch do?

This patch changes the management way for the memory of the CMA area in
the MM subsystem.  Currently the memory of the CMA area is managed by
the zone where their pfn is belong to.  However, this approach has some
problems since MM subsystem doesn't have enough logic to handle the
situation that different characteristic memories are in a single zone.
To solve this issue, this patch try to manage all the memory of the CMA
area by using the MOVABLE zone.  In MM subsystem's point of view,
characteristic of the memory on the MOVABLE zone and the memory of the
CMA area are the same.  So, managing the memory of the CMA area by using
the MOVABLE zone will not have any problem.

2. Motivation

There are some problems with current approach.  See following.  Although
these problem would not be inherent and it could be fixed without this
conception change, it requires many hooks addition in various code path
and it would be intrusive to core MM and would be really error-prone.
Therefore, I try to solve them with this new approach.  Anyway,
following is the problems of the current implementation.

o CMA memory utilization

First, following is the freepage calculation logic in MM.

 - For movable allocation: freepage = total freepage
 - For unmovable allocation: freepage = total freepage - CMA freepage

Freepages on the CMA area is used after the normal freepages in the zone
where the memory of the CMA area is belong to are exhausted.  At that
moment that the number of the normal freepages is zero, so

 - For movable allocation: freepage = total freepage = CMA freepage
 - For unmovable allocation: freepage = 0

If unmovable allocation comes at this moment, allocation request would
fail to pass the watermark check and reclaim is started.  After reclaim,
there would exist the normal freepages so freepages on the CMA areas
would not be used.

FYI, there is another attempt [2] trying to solve this problem in lkml.
And, as far as I know, Qualcomm also has out-of-tree solution for this
problem.

Useless reclaim:

There is no logic to distinguish CMA pages in the reclaim path.  Hence,
CMA page is reclaimed even if the system just needs the page that can be
usable for the kernel allocation.

Atomic allocation failure:

This is also related to the fallback allocation policy for the memory of
the CMA area.  Consider the situation that the number of the normal
freepages is *zero* since the bunch of the movable allocation requests
come.  Kswapd would not be woken up due to following freepage
calculation logic.

- For movable allocation: freepage = total freepage = CMA freepage

If atomic unmovable allocation request comes at this moment, it would
fails due to following logic.

- For unmovable allocation: freepage = total freepage - CMA freepage = 0

It was reported by Aneesh [3].

Useless compaction:

Usual high-order allocation request is unmovable allocation request and
it cannot be served from the memory of the CMA area.  In compaction,
migration scanner try to migrate the page in the CMA area and make
high-order page there.  As mentioned above, it cannot be usable for the
unmovable allocation request so it's just waste.

3. Current approach and new approach

Current approach is that the memory of the CMA area is managed by the
zone where their pfn is belong to.  However, these memory should be
distinguishable since they have a strong limitation.  So, they are
marked as MIGRATE_CMA in pageblock flag and handled specially.  However,
as mentioned in section 2, the MM subsystem doesn't have enough logic to
deal with this special pageblock so many problems raised.

New approach is that the memory of the CMA area is managed by the
MOVABLE zone.  MM already have enough logic to deal with special zone
like as HIGHMEM and MOVABLE zone.  So, managing the memory of the CMA
area by the MOVABLE zone just naturally work well because constraints
for the memory of the CMA area that the memory should always be
migratable is the same with the constraint for the MOVABLE zone.

There is one side-effect for the usability of the memory of the CMA
area.  The use of MOVABLE zone is only allowed for a request with
GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also
only allowed for this gfp flag.  Before this patchset, a request with
GFP_MOVABLE can use them.  IMO, It would not be a big issue since most
of GFP_MOVABLE request also has GFP_HIGHMEM flag.  For example, file
cache page and anonymous page.  However, file cache page for blockdev
file is an exception.  Request for it has no GFP_HIGHMEM flag.  There is
pros and cons on this exception.  In my experience, blockdev file cache
pages are one of the top reason that causes cma_alloc() to fail
temporarily.  So, we can get more guarantee of cma_alloc() success by
discarding this case.

Note that there is no change in admin POV since this patchset is just
for internal implementation change in MM subsystem.  Just one minor
difference for admin is that the memory stat for CMA area will be
printed in the MOVABLE zone.  That's all.

4. Result

Following is the experimental result related to utilization problem.

8 CPUs, 1024 MB, VIRTUAL MACHINE
make -j16

<Before>
  CMA area:               0 MB            512 MB
  Elapsed-time:           92.4		186.5
  pswpin:                 82		18647
  pswpout:                160		69839

<After>
  CMA        :            0 MB            512 MB
  Elapsed-time:           93.1		93.4
  pswpin:                 84		46
  pswpout:                183		92

akpm: "kernel test robot" reported a 26% improvement in
vm-scalability.throughput:
http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop

[1]: lkml.kernel.org/r/1491880640-9944-1-git-send-email-iamjoonsoo.kim@lge.com
[2]: https://lkml.org/lkml/2014/10/15/623
[3]: http://www.spinics.net/lists/linux-mm/msg100562.html

Link: http://lkml.kernel.org/r/1512114786-5085-2-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
Pavel Tatashin
3a2d7fa8a3 mm: disable interrupts while initializing deferred pages
Vlastimil Babka reported about a window issue during which when deferred
pages are initialized, and the current version of on-demand
initialization is finished, allocations may fail.  While this is highly
unlikely scenario, since this kind of allocation request must be large,
and must come from interrupt handler, we still want to cover it.

We solve this by initializing deferred pages with interrupts disabled,
and holding node_size_lock spin lock while pages in the node are being
initialized.  The on-demand deferred page initialization that comes
later will use the same lock, and thus synchronize with
deferred_init_memmap().

It is unlikely for threads that initialize deferred pages to be
interrupted.  They run soon after smp_init(), but before modules are
initialized, and long before user space programs.  This is why there is
no adverse effect of having these threads running with interrupts
disabled.

[pasha.tatashin@oracle.com: v6]
  Link: http://lkml.kernel.org/r/20180313182355.17669-2-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180309220807.24961-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:24 -07:00
Christoph Hellwig
a99583e780 mm: pass the vmem_altmap to memmap_init_zone
Pass the vmem_altmap two levels down instead of needing a lookup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Christoph Hellwig
24b6d41643 mm: pass the vmem_altmap to vmemmap_free
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Christoph Hellwig
da024512a1 mm: pass the vmem_altmap to arch_remove_memory and __remove_pages
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Christoph Hellwig
7b73d978a5 mm: pass the vmem_altmap to vmemmap_populate
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Christoph Hellwig
24e6d5a59a mm: pass the vmem_altmap to arch_add_memory and __add_pages
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-08 11:46:23 -08:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Michal Hocko
3072e413e3 mm/memory_hotplug: introduce add_pages
There are new users of memory hotplug emerging.  Some of them require
different subset of arch_add_memory.  There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space.  We currently have __add_pages for that purpose.  But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug.  E.g.  x86_64 wants to update max_pfn which should be done
by the caller.  Introduce add_pages() which should care about those
details if they are needed.  Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES.  All others use the
currently existing __add_pages.

Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:46 -07:00
Michal Hocko
e5e6893026 mm, memory_hotplug: display allowed zones in the preferred ordering
Prior to commit f1dd2cd13c ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") we used to allow to change the
valid zone types of a memory block if it is adjacent to a different zone
type.

This fact was reflected in memoryNN/valid_zones by the ordering of
printed zones.  The first one was default (echo online > memoryNN/state)
and the other one could be onlined explicitly by online_{movable,kernel}.

This behavior was removed by the said patch and as such the ordering was
not all that important.  In most cases a kernel zone would be default
anyway.  The only exception is movable_node handled by "mm,
memory_hotplug: support movable_node for hotpluggable nodes".

Let's reintroduce this behavior again because later patch will remove
the zone overlap restriction and so user will be allowed to online
kernel resp.  movable block regardless of its placement.  Original
behavior will then become significant again because it would be
non-trivial for users to see what is the default zone to online into.

Implementation is really simple.  Pull out zone selection out of
move_pfn_range into zone_for_pfn_range helper and use it in
show_valid_zones to display the zone for default onlining and then both
kernel and movable if they are allowed.  Default online zone is not
duplicated.

Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: <slaoub@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:25 -07:00
Michal Hocko
4932381ee2 mm, memory_hotplug: move movable_node to the hotplug proper
movable_node_is_enabled is defined in memblock proper while it is
initialized from the memory hotplug proper.  This is quite messy and it
makes a dependency between the two so move movable_node along with the
helper functions to memory_hotplug.

To make it more entertaining the kernel parameter is ignored unless
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
information for each memblock otherwise.  So let's warn when the option
is disabled.

Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kani Toshimitsu <toshi.kani@hpe.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:35 -07:00
Michal Hocko
559bfc7d1b mm, memory_hotplug: remove unused cruft after memory hotplug rework
zone_for_memory doesn't have any user anymore as well as the whole zone
shifting infrastructure so drop them all.

This shouldn't introduce any functional changes.

Link: http://lkml.kernel.org/r/20170515085827.16474-15-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
3d79a728f9 mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory
arch_add_memory gets for_device argument which then controls whether we
want to create memblocks for created memory sections.  Simplify the
logic by telling whether we want memblocks directly rather than going
through pointless negation.  This also makes the api easier to
understand because it is clear what we want rather than nothing telling
for_device which can mean anything.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
c246a213f5 mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone
Heiko Carstens has noticed that he can generate overlapping zones for
ZONE_DMA and ZONE_NORMAL:

  DMA      [mem 0x0000000000000000-0x000000007fffffff]
  Normal   [mem 0x0000000080000000-0x000000017fffffff]

  $ cat /sys/devices/system/memory/block_size_bytes
  10000000
  $ cat /sys/devices/system/memory/memory5/valid_zones
  DMA
  $ echo 0 > /sys/devices/system/memory/memory5/online
  $ cat /sys/devices/system/memory/memory5/valid_zones
  Normal
  $ echo 1 > /sys/devices/system/memory/memory5/online
  Normal

  $ cat /proc/zoneinfo
  Node 0, zone      DMA
  spanned  524288        <-----
  present  458752
  managed  455078
  start_pfn:           0 <-----

  Node 0, zone   Normal
  spanned  720896
  present  589824
  managed  571648
  start_pfn:           327680 <-----

The reason is that we assume that the default zone for kernel onlining
is ZONE_NORMAL.  This was a simplification introduced by the memory
hotplug rework and it is easily fixable by checking the range overlap in
the zone order and considering the first matching zone as the default
one.  If there is no such zone then assume ZONE_NORMAL as we have been
doing so far.

Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
f1dd2cd13c mm, memory_hotplug: do not associate hotadded memory to zones until online
The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug
phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
vast majority of cases this means that they are added to ZONE_NORMAL.
This has been so since 9d99aaa31f ("[PATCH] x86_64: Support memory
hotadd without sparsemem") and it wasn't a big deal back then because
movable onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated.
Rather than reconsidering the zone association which was no longer
needed (because the memory hotplug already depended on SPARSEMEM) a
convoluted semantic of zone shifting has been developed.  Only the
currently last memblock or the one adjacent to the zone_movable can be
onlined movable.  This essentially means that the online type changes as
the new memblocks are added.

Let's simulate memory hot online manually
  $ echo 0x100000000 > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory32/valid_zones
  Normal Movable

  $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  $ echo online_movable > /sys/devices/system/memory/memory34/state
  $ grep . /sys/devices/system/memory/memory3?/valid_zones
  /sys/devices/system/memory/memory32/valid_zones:Normal
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g.  association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages with
any zone at all.  All the pages are just marked reserved and wait for
the onlining phase to be associated with the zone as per the online
request.  There are only two requirements

	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap

	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses

the latter one is not an inherent requirement and can be changed in the
future.  It preserves the current behavior and made the code slightly
simpler.  This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state: Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Normal Movable

  /sys/devices/system/memory/memory32/valid_zones:Normal Movable
  /sys/devices/system/memory/memory33/valid_zones:Normal Movable
  /sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node.  __add_pages is updated to not require the
zone and only initializes sections in the range.  This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some of
code).

devm_memremap_pages is the only user of arch_add_memory which relies on
the zone association because it only hooks into the memory hotplug only
half way.  It uses it to associate the new memory with ZONE_DEVICE but
doesn't allow it to be {on,off}lined via sysfs.  This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs.  Movable)
used to allow to change its movable type.  This will be handled later.

[richard.weiyang@gmail.com: simplify zone_intersects()]
  Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
[richard.weiyang@gmail.com: remove duplicate call for set_page_links]
  Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
[akpm@linux-foundation.org: remove unused local `i']
Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
2d070eab2e mm: consider zone which is not fully populated to have holes
__pageblock_pfn_to_page has two users currently, set_zone_contiguous
which checks whether the given zone contains holes and
pageblock_pfn_to_page which then carefully returns a first valid page
from the given pfn range for the given zone.  This doesn't handle zones
which are not fully populated though.  Memory pageblocks can be offlined
or might not have been onlined yet.  In such a case the zone should be
considered to have holes otherwise pfn walkers can touch and play with
offline pages.

Current callers of pageblock_pfn_to_page in compaction seem to work
properly right now because they only isolate PageBuddy
(isolate_freepages_block) or PageLRU resp.  __PageMovable
(isolate_migratepages_block) which will be always false for these pages.
It would be safer to skip these pages altogether, though.

In order to do this patch adds a new memory section state
(SECTION_IS_ONLINE) which is set in memory_present (during boot time) or
in online_pages_range during the memory hotplug.  Similarly
offline_mem_sections clears the bit and it is called when the memory
range is offlined.

pfn_to_online_page helper is then added which check the mem section and
only returns a page if it is onlined already.

Use the new helper in __pageblock_pfn_to_page and skip the whole page
block in such a case.

[mhocko@suse.com: check valid section number in pfn_to_online_page (Vlastimil),
 mark sections online after all struct pages are initialized in
 online_pages_range (Vlastimil)]
  Link: http://lkml.kernel.org/r/20170518164210.GD18333@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170515085827.16474-8-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Michal Hocko
1b862aecfb mm, memory_hotplug: get rid of is_zone_device_section
Device memory hotplug hooks into regular memory hotplug only half way.
It needs memory sections to track struct pages but there is no
need/desire to associate those sections with memory blocks and export
them to the userspace via sysfs because they cannot be onlined anyway.

This is currently expressed by for_device argument to arch_add_memory
which then makes sure to associate the given memory range with
ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
to distinguish special memory hotplug from the regular one.  While this
works now, later patches in this series want to move __add_zone outside
of arch_add_memory path so we have to come up with something else.

Add want_memblock down the __add_pages path and use it to control
whether the section->memblock association should be done.
arch_add_memory then just trivially want memblock for everything but
for_device hotplug.

remove_memory_section doesn't need is_zone_device_section either.  We
can simply skip all the memblock specific cleanup if there is no
memblock for the given section.

This shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Tobias Regnery <tobias.regnery@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:32 -07:00
Toshi Kani
a96dfddbcc base/memory, hotplug: fix a kernel oops in show_valid_zones()
Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237c03 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.com
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>	[4.4+]

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 14:13:19 -08:00
Yasuaki Ishimatsu
8a1f780e7f memory_hotplug: make zone_can_shift() return a boolean value
online_{kernel|movable} is used to change the memory zone to
ZONE_{NORMAL|MOVABLE} and online the memory.

To check that memory zone can be changed, zone_can_shift() is used.
Currently the function returns minus integer value, plus integer
value and 0. When the function returns minus or plus integer value,
it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.

But when the function returns 0, there are two meanings.

One of the meanings is that the memory zone does not need to be changed.
For example, when memory is in ZONE_NORMAL and onlined by online_kernel
the memory zone does not need to be changed.

Another meaning is that the memory zone cannot be changed. When memory
is in ZONE_NORMAL and onlined by online_movable, the memory zone may
not be changed to ZONE_MOVALBE due to memory online limitation(see
Documentation/memory-hotplug.txt). In this case, memory must not be
onlined.

The patch changes the return type of zone_can_shift() so that memory
online operation fails when memory zone cannot be changed as follows:

Before applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  8388608
           managed  8388608

   online_movable operation succeeded. But memory is onlined as
   ZONE_NORMAL, not ZONE_MOVABLE.

After applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   bash: echo: write error: Invalid argument
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320

   online_movable operation failed because of failure of changing
   the memory zone from ZONE_NORMAL to ZONE_MOVABLE

Fixes: df429ac039 ("memory-hotplug: more general validation of zone during online")
Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Reza Arbab
df429ac039 memory-hotplug: more general validation of zone during online
When memory is onlined, we are only able to rezone from ZONE_MOVABLE to
ZONE_KERNEL, or from (ZONE_MOVABLE - 1) to ZONE_MOVABLE.

To be more flexible, use the following criteria instead; to online
memory from zone X into zone Y,

* Any zones between X and Y must be unused.
* If X is lower than Y, the onlined memory must lie at the end of X.
* If X is higher than Y, the onlined memory must lie at the start of X.

Add zone_can_shift() to make this determination.

Link: http://lkml.kernel.org/r/1462816419-4479-3-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewd-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Banman <abanman@sgi.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Linus Torvalds
7ded384a12 mm: fix section mismatch warning
The register_page_bootmem_info_node() function needs to be marked __init
in order to avoid a new warning introduced by commit f65e91df25 ("mm:
use early_pfn_to_nid in register_page_bootmem_info_node").

Otherwise you'll get a warning about how a non-init function calls
early_pfn_to_nid (which is __meminit)

Cc: Yang Shi <yang.shi@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:23:32 -07:00