Commit Graph

277 Commits

Author SHA1 Message Date
Minchan Kim
d3e755f0bd mm: introduce deactivate_page
perprocess reclaims needs to deactivate file pages from active LRU
when echo file > /proc/<pid>/reclaim.
Add deactivate_file pages.

Bug: 131016077
Bug: 153444106
(cherry picked from b07eab27085611203ad359b7f4eecd138d7d771a)
Change-Id: I06fed20103671e4ca6fb8663d5029736442162a5
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:20:49 +08:00
Martin Liu
9a12974ec9 Revert "mm: introduce __lru_cache_add_active_or_unevictable"
This reverts commit 808c47e1d5.

Reason for revert: remove SPF non upstream code
Bug: 140544941
Test: boot
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ib3db1c8395580f73a38e2fb43cfbefedbdd553d4
2020-05-30 02:20:34 +08:00
Martin Liu
ef36c60fe7 mm: Revert previous mm revert list
This commit reverts e799c1b10c54...cfb042c6c5d1

Reason for revert: unblock GKI
Bug: 140544941
Test: boot
Change-Id: I4ebe6c01918788cdc2468ceabf101ef7c3e3c452
Signed-off-by: Martin Liu <liumartin@google.com>
2020-05-30 02:20:27 +08:00
Minchan Kim
9661924262 Revert "mm: introduce __lru_cache_add_active_or_unevictable"
This reverts commit 808c47e1d5.

Reason for revert: revet customized code
Bug: 140544941
Test: boot
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: I535daca8bb48226a7f7c15f30d703455ba47abe6
2020-05-30 02:20:16 +08:00
Ivaylo Georgiev
cbbfa7615c Merge android-4.19-q.88 (47d86d5) into msm-4.19
* refs/heads/tmp-47d86d5:
  Linux 4.19.88
  net: fec: fix clock count mis-match
  platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
  platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
  dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
  ASoC: stm32: sai: add missing put_device()
  ASoC: stm32: i2s: fix IRQ clearing
  ASoC: stm32: i2s: fix 16 bit format support
  ASoC: stm32: i2s: fix dma configuration
  pinctrl: stm32: fix memory leak issue
  mailbox: mailbox-test: fix null pointer if no mmio
  clk: stm32mp1: parent clocks update
  clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
  clk: stm32mp1: fix mcu divider table
  clk: stm32mp1: fix HSI divider flag
  hwrng: stm32 - fix unbalanced pm_runtime_enable
  media: stm32-dcmi: fix check of pm_runtime_get_sync return value
  media: stm32-dcmi: fix DMA corruption when stopping streaming
  crypto: stm32/hash - Fix hmac issue more than 256 bytes
  HID: core: check whether Usage Page item is after Usage ID items
  tcp: exit if nothing to retransmit on RTO timeout
  mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
  drm/atmel-hlcdc: revert shift by 8
  mtd: spi-nor: cast to u64 to avoid uint overflows
  mtd: rawnand: atmel: fix possible object reference leak
  mtd: rawnand: atmel: Fix spelling mistake in error message
  net: macb driver, check for SKBTX_HW_TSTAMP
  net: macb: Fix SUBNS increment and increase resolution
  watchdog: sama5d4: fix WDD value to be always set to max
  ext4: add more paranoia checking in ext4_expand_extra_isize handling
  net: macb: add missed tasklet_kill
  net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
  sctp: cache netns in sctp_ep_common
  tipc: fix link name length check
  selftests: bpf: test_sockmap: handle file creation failures gracefully
  openvswitch: remove another BUG_ON()
  openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
  slip: Fix use-after-free Read in slip_open
  sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
  openvswitch: fix flow command message size
  net: psample: fix skb_over_panic
  macvlan: schedule bc_work even if error
  media: atmel: atmel-isc: fix INIT_WORK misplacement
  media: atmel: atmel-isc: fix asd memory allocation
  pwm: Clear chip_data in pwm_put()
  net: macb: fix error format in dev_err()
  media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
  xfrm: Fix memleak on xfrm state destroy
  thunderbolt: Power cycle the router if NVM authentication fails
  mei: me: add comet point V device id
  mei: bus: prefix device names on bus with the bus name
  USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
  staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
  staging: rtl8723bs: Drop ACPI device ids
  staging: rtl8192e: fix potential use after free
  usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
  clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
  clk: at91: fix update bit maps on CFG_MOR write
  mm, gup: add missing refcount overflow checks on s390
  mtd: Remove a debug trace in mtdpart.c
  xdp: fix cpumap redirect SKB creation bug
  powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
  ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
  RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
  RDMA/hns: Fix the state of rereg mr
  RDMA/hns: Bugfix for the scene without receiver queue
  RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
  scsi: libsas: Check SMP PHY control function result
  scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
  ACPI / APEI: Switch estatus pool to use vmalloc memory
  ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
  scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
  apparmor: delete the dentry in aafs_remove() to avoid a leak
  iommu/amd: Fix NULL dereference bug in match_hid_uid
  net: hns3: fix an issue for hns3_update_new_int_gl
  net: hns3: fix an issue for hclgevf_ae_get_hdev
  net: hns3: fix PFC not setting problem for DCB module
  net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
  bpf: drop refcount if bpf_map_new_fd() fails in map_create()
  kvm: properly check debugfs dentry before using it
  net: dev: Use unsigned integer as an argument to left-shift
  mmc: core: align max segment size with logical block size
  bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
  sctp: don't compare hb_timer expire date before starting it
  net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
  net: ip_gre: do not report erspan_ver for gre or gretap
  net: fix possible overflow in __sk_mem_raise_allocated()
  geneve: change NET_UDP_TUNNEL dependency to select
  sfc: initialise found bitmap in efx_ef10_mtd_probe
  ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
  tipc: fix skb may be leaky in tipc_link_input
  net/smc: fix byte_order for rx_curs_confirmed
  blktrace: Show requests without sector
  net/smc: fix sender_free computation
  xfs: end sync buffer I/O properly on shutdown error
  mm/hotplug: invalid PFNs from pfn_to_online_page()
  net/smc: don't wait for send buffer space when data was already sent
  net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
  decnet: fix DN_IFREQ_SIZE
  ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
  sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
  gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
  serial: 8250: Fix serial8250 initialization crash
  net/core/neighbour: fix kmemleak minimal reference count for hash tables
  PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
  ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
  net/core/neighbour: tell kmemleak about hash tables
  tipc: fix memory leak in tipc_nl_compat_publ_dump
  mtd: Check add_mtd_device() ret code
  lib/genalloc.c: include vmalloc.h
  drivers/base/platform.c: kmemleak ignore a known leak
  fork: fix some -Wmissing-prototypes warnings
  lib/genalloc.c: use vzalloc_node() to allocate the bitmap
  lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
  firmware: arm_sdei: Fix DT platform device creation
  firmware: arm_sdei: fix wrong of_node_put() in init function
  infiniband/qedr: Potential null ptr dereference of qp
  infiniband: bnxt_re: qplib: Check the return value of send_message
  xprtrdma: Prevent leak of rpcrdma_rep objects
  netfilter: nf_tables: fix a missing check of nla_put_failure
  tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
  mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
  mm/page_alloc.c: use a single function to free page
  mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
  vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
  ocfs2: clear journal dirty flag after shutdown journal
  net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
  net: marvell: fix a missing check of acpi_match_device
  tipc: fix a missing check of genlmsg_put
  atl1e: checking the status of atl1e_write_phy_reg
  net: dsa: bcm_sf2: Propagate error value from mdio_write
  net: stmicro: fix a missing check of clk_prepare
  net: (cpts) fix a missing check of clk_prepare
  um: Make GCOV depend on !KCOV
  um: Include sys/uio.h to have writev()
  f2fs: fix to dirty inode synchronously
  f2fs: fix block address for __check_sit_bitmap
  net/net_namespace: Check the return value of register_pernet_subsys()
  net/netlink_compat: Fix a missing check of nla_parse_nested
  pwm: clps711x: Fix period calculation
  crypto: mxc-scc - fix build warnings on ARM64
  powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
  powerpc/pseries: Fix node leak in update_lmb_associativity_index()
  powerpc/83xx: handle machine check caused by watchdog timer
  regulator: tps65910: fix a missing check of return value
  bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
  IB/rxe: Make counters thread safe
  drbd: fix print_st_err()'s prototype to match the definition
  drbd: do not block when adjusting "disk-options" while IO is frozen
  drbd: reject attach of unsuitable uuids even if connected
  drbd: ignore "all zero" peer volume sizes in handshake
  powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
  vfio/spapr_tce: Get rid of possible infinite loop
  powerpc/44x/bamboo: Fix PCI range
  powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
  powerpc/prom: fix early DEBUG messages
  powerpc/32: Avoid unsupported flags with clang
  powerpc/perf: Fix unit_sel/cache_sel checks
  ath6kl: Fix off by one error in scan completion
  ath6kl: Only use match sets when firmware supports it
  brcmfmac: Fix access point mode
  scsi: csiostor: fix incorrect dma device in case of vport
  scsi: qla2xxx: deadlock by configfs_depend_item
  RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
  openrisc: Fix broken paths to arch/or32
  serial: max310x: Fix tx_empty() callback
  Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
  drivers/regulator: fix a missing check of return value
  powerpc/xmon: fix dump_segments()
  powerpc/book3s/32: fix number of bats in p/v_block_mapped()
  vxlan: Fix error path in __vxlan_dev_create()
  clocksource/drivers/fttmr010: Fix invalid interrupt register access
  IB/qib: Fix an error code in qib_sdma_verbs_send()
  xfs: Fix bulkstat compat ioctls on x32 userspace.
  xfs: Align compat attrlist_by_handle with native implementation.
  dm raid: fix false -EBUSY when handling check/repair message
  gfs2: take jdata unstuff into account in do_grow
  dm flakey: Properly corrupt multi-page bios.
  HID: doc: fix wrong data structure reference for UHID_OUTPUT
  pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
  pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
  pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
  KVM: s390: unregister debug feature on failing arch init
  bnxt_en: query force speeds before disabling autoneg mode.
  bnxt_en: Save ring statistics before reset.
  bnxt_en: Return linux standard errors in bnxt_ethtool.c
  exofs_mount(): fix leaks on failure exits
  netfilter: nf_nat_sip: fix RTP/RTCP source port translations
  net/mlx5: Continue driver initialization despite debugfs failure
  pinctrl: xway: fix gpio-hog related boot issues
  memory: omap-gpmc: Get the header of the enum
  vfio-mdev/samples: Use u8 instead of char for handle functions
  kprobes/x86: Show x86-64 specific blacklisted symbols correctly
  kprobes: Blacklist symbols in arch-defined prohibited area
  xen/pciback: Check dev_data before using it
  kprobes/x86/xen: blacklist non-attachable xen interrupt functions
  serial: 8250: Rate limit serial port rx interrupts during input overruns
  gpio: raspberrypi-exp: decrease refcount on firmware dt node
  HID: intel-ish-hid: fixes incorrect error handling
  serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
  btrfs: only track ref_heads in delayed_ref_updates
  Btrfs: allow clear_extent_dirty() to receive a cached extent state record
  btrfs: dev-replace: set result code of cancel by status of scrub
  btrfs: fix ncopies raid_attr for RAID56
  btrfs: Check for missing device before bio submission in btrfs_map_bio
  usb: ehci-omap: Fix deferred probe for phy handling
  mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
  mmc: meson-gx: make sure the descriptor is stopped on errors
  VSOCK: bind to random port for VMADDR_PORT_ANY
  crypto/chelsio/chtls: listen fails with multiadapt
  Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
  Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
  kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
  gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
  gpio: pca953x: Fix AI overflow on PCAL6524
  iwlwifi: pcie: set cmd_len in the correct place
  iwlwifi: pcie: fix erroneous print
  iwlwifi: mvm: force TCM re-evaluation on TCM resume
  iwlwifi: move iwl_nvm_check_version() into dvm
  microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
  microblaze: move "... is ready" messages to arch/microblaze/Makefile
  microblaze: adjust the help to the real behavior
  ubi: Do not drop UBI device reference before using
  ubi: Put MTD device after it is not used
  ubifs: Fix default compression selection in ubifs
  nvme: fix kernel paging oops
  xfs: require both realtime inodes to mount
  bcache: do not mark writeback_running too early
  bcache: do not check if debug dentry is ERR or NULL explicitly on remove
  rtl818x: fix potential use after free
  brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
  brcmfmac: set F2 watermark to 256 for 4373
  mwifiex: debugfs: correct histogram spacing, formatting
  mwifiex: fix potential NULL dereference and use after free
  arm64: dts: renesas: draak: Fix CVBS input
  crypto: user - support incremental algorithm dumps
  s390/zcrypt: make sysfs reset attribute trigger queue reset
  nvme: provide fallback for discard alloc failure
  scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
  scsi: qla2xxx: Fix NPIV handling for FC-NVMe
  scsi: lpfc: Enable Management features for IF_TYPE=6
  ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
  ARM: ks8695: fix section mismatch warning
  xfs: zero length symlinks are not valid
  PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
  RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
  arm64: preempt: Fix big-endian when checking preempt count in assembly
  RDMA/hns: Fix the bug while use multi-hop of pbl
  ARM: OMAP1: fix USB configuration for device-only setups
  platform/x86: mlx-platform: Fix LED configuration
  bus: ti-sysc: Check for no-reset and no-idle flags at the child level
  arm64: smp: Handle errors reported by the firmware
  arm64: mm: Prevent mismatched 52-bit VA support
  ARM: dts: Fix hsi gdd range for omap4
  parisc: Fix HP SDC hpa address output
  parisc: Fix serio address output
  ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
  ARM: dts: imx25: Fix memory node duplication
  ARM: dts: imx27: Fix memory node duplication
  ARM: dts: imx1: Fix memory node duplication
  ARM: dts: imx23: Fix memory node duplication
  ARM: dts: imx50: Fix memory node duplication
  ARM: dts: imx6sl: Fix memory node duplication
  ARM: dts: imx6sx: Fix memory node duplication
  ARM: dts: imx6ul: Fix memory node duplication
  ARM: dts: imx7: Fix memory node duplication
  ARM: dts: imx35: Fix memory node duplication
  ARM: dts: imx31: Fix memory node duplication
  ARM: dts: imx53: Fix memory node duplication
  ARM: dts: imx51: Fix memory node duplication
  ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
  tracing: Lock event_mutex before synth_event_mutex
  ARM: dts: Fix up SQ201 flash access
  scsi: lpfc: Fix dif and first burst use in write commands
  scsi: lpfc: Fix kernel Oops due to null pring pointers
  scsi: target/tcmu: Fix queue_cmd_ring() declaration
  pwm: bcm-iproc: Prevent unloading the driver module while in use
  block: drbd: remove a stray unlock in __drbd_send_protocol()
  mac80211: fix station inactive_time shortly after boot
  net/fq_impl: Switch to kvmalloc() for memory allocation
  ceph: return -EINVAL if given fsc mount option on kernel w/o support
  net: mscc: ocelot: fix __ocelot_rmw_ix prototype
  net: bcmgenet: reapply manual settings to the PHY
  net: bcmgenet: use RGMII loopback for MAC reset
  scripts/gdb: fix debugging modules compiled with hot/cold partitioning
  ASoC: stm32: sai: add restriction on mmap support
  watchdog: meson: Fix the wrong value of left time
  can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
  can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
  can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
  can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
  can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
  can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
  can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
  can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
  can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
  can: peak_usb: report bus recovery as well
  bridge: ebtables: don't crash when using dnat target in output chains
  net: fec: add missed clk_disable_unprepare in remove
  clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
  clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
  x86/resctrl: Prevent NULL pointer dereference when reading mondata
  idr: Fix idr_alloc_u32 on 32-bit systems
  idr: Fix integer overflow in idr_for_each_entry
  powerpc/bpf: Fix tail call implementation
  samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
  ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
  clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
  clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
  clk: at91: avoid sleeping early
  reset: fix reset_control_ops kerneldoc comment
  ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
  pinctrl: cherryview: Allocate IRQ chip dynamic
  clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
  ASoC: kirkwood: fix device remove ordering
  ASoC: kirkwood: fix external clock probe defer
  clk: samsung: exynos5433: Fix error paths
  reset: Fix memory leak in reset_control_array_put()
  ASoC: compress: fix unsigned integer overflow check
  ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
  clocksource/drivers/mediatek: Fix error handling
  clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate

Conflicts:
	drivers/mmc/core/queue.c

Change-Id: I17f5829400ece4e39be4c9f6223fff206c710d06
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2020-01-28 03:20:43 -08:00
Greg Kroah-Hartman
47d86d5c55 Merge 4.19.88 into android-4.19-q
Changes in 4.19.88
	clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate
	clocksource/drivers/mediatek: Fix error handling
	ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX
	ASoC: compress: fix unsigned integer overflow check
	reset: Fix memory leak in reset_control_array_put()
	clk: samsung: exynos5433: Fix error paths
	ASoC: kirkwood: fix external clock probe defer
	ASoC: kirkwood: fix device remove ordering
	clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
	pinctrl: cherryview: Allocate IRQ chip dynamic
	ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
	reset: fix reset_control_ops kerneldoc comment
	clk: at91: avoid sleeping early
	clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
	clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
	ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
	samples/bpf: fix build by setting HAVE_ATTR_TEST to zero
	powerpc/bpf: Fix tail call implementation
	idr: Fix integer overflow in idr_for_each_entry
	idr: Fix idr_alloc_u32 on 32-bit systems
	x86/resctrl: Prevent NULL pointer dereference when reading mondata
	clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call
	clk: ti: clkctrl: Fix failed to enable error with double udelay timeout
	net: fec: add missed clk_disable_unprepare in remove
	bridge: ebtables: don't crash when using dnat target in output chains
	can: peak_usb: report bus recovery as well
	can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open
	can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak
	can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max
	can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM
	can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors
	can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error
	can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error
	can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails
	can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition
	watchdog: meson: Fix the wrong value of left time
	ASoC: stm32: sai: add restriction on mmap support
	scripts/gdb: fix debugging modules compiled with hot/cold partitioning
	net: bcmgenet: use RGMII loopback for MAC reset
	net: bcmgenet: reapply manual settings to the PHY
	net: mscc: ocelot: fix __ocelot_rmw_ix prototype
	ceph: return -EINVAL if given fsc mount option on kernel w/o support
	net/fq_impl: Switch to kvmalloc() for memory allocation
	mac80211: fix station inactive_time shortly after boot
	block: drbd: remove a stray unlock in __drbd_send_protocol()
	pwm: bcm-iproc: Prevent unloading the driver module while in use
	scsi: target/tcmu: Fix queue_cmd_ring() declaration
	scsi: lpfc: Fix kernel Oops due to null pring pointers
	scsi: lpfc: Fix dif and first burst use in write commands
	ARM: dts: Fix up SQ201 flash access
	tracing: Lock event_mutex before synth_event_mutex
	ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed
	ARM: dts: imx51: Fix memory node duplication
	ARM: dts: imx53: Fix memory node duplication
	ARM: dts: imx31: Fix memory node duplication
	ARM: dts: imx35: Fix memory node duplication
	ARM: dts: imx7: Fix memory node duplication
	ARM: dts: imx6ul: Fix memory node duplication
	ARM: dts: imx6sx: Fix memory node duplication
	ARM: dts: imx6sl: Fix memory node duplication
	ARM: dts: imx50: Fix memory node duplication
	ARM: dts: imx23: Fix memory node duplication
	ARM: dts: imx1: Fix memory node duplication
	ARM: dts: imx27: Fix memory node duplication
	ARM: dts: imx25: Fix memory node duplication
	ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication
	parisc: Fix serio address output
	parisc: Fix HP SDC hpa address output
	ARM: dts: Fix hsi gdd range for omap4
	arm64: mm: Prevent mismatched 52-bit VA support
	arm64: smp: Handle errors reported by the firmware
	bus: ti-sysc: Check for no-reset and no-idle flags at the child level
	platform/x86: mlx-platform: Fix LED configuration
	ARM: OMAP1: fix USB configuration for device-only setups
	RDMA/hns: Fix the bug while use multi-hop of pbl
	arm64: preempt: Fix big-endian when checking preempt count in assembly
	RDMA/vmw_pvrdma: Use atomic memory allocation in create AH
	PM / AVS: SmartReflex: NULL check before some freeing functions is not needed
	xfs: zero length symlinks are not valid
	ARM: ks8695: fix section mismatch warning
	ACPI / LPSS: Ignore acpi_device_fix_up_power() return value
	scsi: lpfc: Enable Management features for IF_TYPE=6
	scsi: qla2xxx: Fix NPIV handling for FC-NVMe
	scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port
	nvme: provide fallback for discard alloc failure
	s390/zcrypt: make sysfs reset attribute trigger queue reset
	crypto: user - support incremental algorithm dumps
	arm64: dts: renesas: draak: Fix CVBS input
	mwifiex: fix potential NULL dereference and use after free
	mwifiex: debugfs: correct histogram spacing, formatting
	brcmfmac: set F2 watermark to 256 for 4373
	brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373
	rtl818x: fix potential use after free
	bcache: do not check if debug dentry is ERR or NULL explicitly on remove
	bcache: do not mark writeback_running too early
	xfs: require both realtime inodes to mount
	nvme: fix kernel paging oops
	ubifs: Fix default compression selection in ubifs
	ubi: Put MTD device after it is not used
	ubi: Do not drop UBI device reference before using
	microblaze: adjust the help to the real behavior
	microblaze: move "... is ready" messages to arch/microblaze/Makefile
	microblaze: fix multiple bugs in arch/microblaze/boot/Makefile
	iwlwifi: move iwl_nvm_check_version() into dvm
	iwlwifi: mvm: force TCM re-evaluation on TCM resume
	iwlwifi: pcie: fix erroneous print
	iwlwifi: pcie: set cmd_len in the correct place
	gpio: pca953x: Fix AI overflow on PCAL6524
	gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB
	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests
	Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS"
	Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()"
	crypto/chelsio/chtls: listen fails with multiadapt
	VSOCK: bind to random port for VMADDR_PORT_ANY
	mmc: meson-gx: make sure the descriptor is stopped on errors
	mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET
	usb: ehci-omap: Fix deferred probe for phy handling
	btrfs: Check for missing device before bio submission in btrfs_map_bio
	btrfs: fix ncopies raid_attr for RAID56
	btrfs: dev-replace: set result code of cancel by status of scrub
	Btrfs: allow clear_extent_dirty() to receive a cached extent state record
	btrfs: only track ref_heads in delayed_ref_updates
	serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback
	HID: intel-ish-hid: fixes incorrect error handling
	gpio: raspberrypi-exp: decrease refcount on firmware dt node
	serial: 8250: Rate limit serial port rx interrupts during input overruns
	kprobes/x86/xen: blacklist non-attachable xen interrupt functions
	xen/pciback: Check dev_data before using it
	kprobes: Blacklist symbols in arch-defined prohibited area
	kprobes/x86: Show x86-64 specific blacklisted symbols correctly
	vfio-mdev/samples: Use u8 instead of char for handle functions
	memory: omap-gpmc: Get the header of the enum
	pinctrl: xway: fix gpio-hog related boot issues
	net/mlx5: Continue driver initialization despite debugfs failure
	netfilter: nf_nat_sip: fix RTP/RTCP source port translations
	exofs_mount(): fix leaks on failure exits
	bnxt_en: Return linux standard errors in bnxt_ethtool.c
	bnxt_en: Save ring statistics before reset.
	bnxt_en: query force speeds before disabling autoneg mode.
	KVM: s390: unregister debug feature on failing arch init
	pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width
	pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration
	pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10
	HID: doc: fix wrong data structure reference for UHID_OUTPUT
	dm flakey: Properly corrupt multi-page bios.
	gfs2: take jdata unstuff into account in do_grow
	dm raid: fix false -EBUSY when handling check/repair message
	xfs: Align compat attrlist_by_handle with native implementation.
	xfs: Fix bulkstat compat ioctls on x32 userspace.
	IB/qib: Fix an error code in qib_sdma_verbs_send()
	clocksource/drivers/fttmr010: Fix invalid interrupt register access
	vxlan: Fix error path in __vxlan_dev_create()
	powerpc/book3s/32: fix number of bats in p/v_block_mapped()
	powerpc/xmon: fix dump_segments()
	drivers/regulator: fix a missing check of return value
	Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading
	serial: max310x: Fix tx_empty() callback
	openrisc: Fix broken paths to arch/or32
	RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer
	scsi: qla2xxx: deadlock by configfs_depend_item
	scsi: csiostor: fix incorrect dma device in case of vport
	brcmfmac: Fix access point mode
	ath6kl: Only use match sets when firmware supports it
	ath6kl: Fix off by one error in scan completion
	powerpc/perf: Fix unit_sel/cache_sel checks
	powerpc/32: Avoid unsupported flags with clang
	powerpc/prom: fix early DEBUG messages
	powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
	powerpc/44x/bamboo: Fix PCI range
	vfio/spapr_tce: Get rid of possible infinite loop
	powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
	drbd: ignore "all zero" peer volume sizes in handshake
	drbd: reject attach of unsuitable uuids even if connected
	drbd: do not block when adjusting "disk-options" while IO is frozen
	drbd: fix print_st_err()'s prototype to match the definition
	IB/rxe: Make counters thread safe
	bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
	regulator: tps65910: fix a missing check of return value
	powerpc/83xx: handle machine check caused by watchdog timer
	powerpc/pseries: Fix node leak in update_lmb_associativity_index()
	powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
	crypto: mxc-scc - fix build warnings on ARM64
	pwm: clps711x: Fix period calculation
	net/netlink_compat: Fix a missing check of nla_parse_nested
	net/net_namespace: Check the return value of register_pernet_subsys()
	f2fs: fix block address for __check_sit_bitmap
	f2fs: fix to dirty inode synchronously
	um: Include sys/uio.h to have writev()
	um: Make GCOV depend on !KCOV
	net: (cpts) fix a missing check of clk_prepare
	net: stmicro: fix a missing check of clk_prepare
	net: dsa: bcm_sf2: Propagate error value from mdio_write
	atl1e: checking the status of atl1e_write_phy_reg
	tipc: fix a missing check of genlmsg_put
	net: marvell: fix a missing check of acpi_match_device
	net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe()
	ocfs2: clear journal dirty flag after shutdown journal
	vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
	mm/page_alloc.c: free order-0 pages through PCP in page_frag_free()
	mm/page_alloc.c: use a single function to free page
	mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free()
	tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures
	netfilter: nf_tables: fix a missing check of nla_put_failure
	xprtrdma: Prevent leak of rpcrdma_rep objects
	infiniband: bnxt_re: qplib: Check the return value of send_message
	infiniband/qedr: Potential null ptr dereference of qp
	firmware: arm_sdei: fix wrong of_node_put() in init function
	firmware: arm_sdei: Fix DT platform device creation
	lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk
	lib/genalloc.c: use vzalloc_node() to allocate the bitmap
	fork: fix some -Wmissing-prototypes warnings
	drivers/base/platform.c: kmemleak ignore a known leak
	lib/genalloc.c: include vmalloc.h
	mtd: Check add_mtd_device() ret code
	tipc: fix memory leak in tipc_nl_compat_publ_dump
	net/core/neighbour: tell kmemleak about hash tables
	ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
	PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity()
	net/core/neighbour: fix kmemleak minimal reference count for hash tables
	serial: 8250: Fix serial8250 initialization crash
	gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change
	sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
	ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
	decnet: fix DN_IFREQ_SIZE
	net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
	net/smc: don't wait for send buffer space when data was already sent
	mm/hotplug: invalid PFNs from pfn_to_online_page()
	xfs: end sync buffer I/O properly on shutdown error
	net/smc: fix sender_free computation
	blktrace: Show requests without sector
	net/smc: fix byte_order for rx_curs_confirmed
	tipc: fix skb may be leaky in tipc_link_input
	ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI
	sfc: initialise found bitmap in efx_ef10_mtd_probe
	geneve: change NET_UDP_TUNNEL dependency to select
	net: fix possible overflow in __sk_mem_raise_allocated()
	net: ip_gre: do not report erspan_ver for gre or gretap
	net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap
	sctp: don't compare hb_timer expire date before starting it
	bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
	mmc: core: align max segment size with logical block size
	net: dev: Use unsigned integer as an argument to left-shift
	kvm: properly check debugfs dentry before using it
	bpf: drop refcount if bpf_map_new_fd() fails in map_create()
	net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED
	net: hns3: fix PFC not setting problem for DCB module
	net: hns3: fix an issue for hclgevf_ae_get_hdev
	net: hns3: fix an issue for hns3_update_new_int_gl
	iommu/amd: Fix NULL dereference bug in match_hid_uid
	apparmor: delete the dentry in aafs_remove() to avoid a leak
	scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery
	ACPI / APEI: Don't wait to serialise with oops messages when panic()ing
	ACPI / APEI: Switch estatus pool to use vmalloc memory
	scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned
	scsi: libsas: Check SMP PHY control function result
	RDMA/hns: Fix the bug with updating rq head pointer when flush cqe
	RDMA/hns: Bugfix for the scene without receiver queue
	RDMA/hns: Fix the state of rereg mr
	RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp
	ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board
	powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
	xdp: fix cpumap redirect SKB creation bug
	mtd: Remove a debug trace in mtdpart.c
	mm, gup: add missing refcount overflow checks on s390
	clk: at91: fix update bit maps on CFG_MOR write
	clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated()
	usb: dwc2: use a longer core rest timeout in dwc2_core_reset()
	staging: rtl8192e: fix potential use after free
	staging: rtl8723bs: Drop ACPI device ids
	staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids
	USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P
	mei: bus: prefix device names on bus with the bus name
	mei: me: add comet point V device id
	thunderbolt: Power cycle the router if NVM authentication fails
	xfrm: Fix memleak on xfrm state destroy
	media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE
	net: macb: fix error format in dev_err()
	pwm: Clear chip_data in pwm_put()
	media: atmel: atmel-isc: fix asd memory allocation
	media: atmel: atmel-isc: fix INIT_WORK misplacement
	macvlan: schedule bc_work even if error
	net: psample: fix skb_over_panic
	openvswitch: fix flow command message size
	sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook
	slip: Fix use-after-free Read in slip_open
	openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info()
	openvswitch: remove another BUG_ON()
	selftests: bpf: test_sockmap: handle file creation failures gracefully
	tipc: fix link name length check
	sctp: cache netns in sctp_ep_common
	net: sched: fix `tc -s class show` no bstats on class with nolock subqueues
	net: macb: add missed tasklet_kill
	ext4: add more paranoia checking in ext4_expand_extra_isize handling
	watchdog: sama5d4: fix WDD value to be always set to max
	net: macb: Fix SUBNS increment and increase resolution
	net: macb driver, check for SKBTX_HW_TSTAMP
	mtd: rawnand: atmel: Fix spelling mistake in error message
	mtd: rawnand: atmel: fix possible object reference leak
	mtd: spi-nor: cast to u64 to avoid uint overflows
	drm/atmel-hlcdc: revert shift by 8
	mailbox: stm32_ipcc: add spinlock to fix channels concurrent access
	tcp: exit if nothing to retransmit on RTO timeout
	HID: core: check whether Usage Page item is after Usage ID items
	crypto: stm32/hash - Fix hmac issue more than 256 bytes
	media: stm32-dcmi: fix DMA corruption when stopping streaming
	media: stm32-dcmi: fix check of pm_runtime_get_sync return value
	hwrng: stm32 - fix unbalanced pm_runtime_enable
	clk: stm32mp1: fix HSI divider flag
	clk: stm32mp1: fix mcu divider table
	clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks
	clk: stm32mp1: parent clocks update
	mailbox: mailbox-test: fix null pointer if no mmio
	pinctrl: stm32: fix memory leak issue
	ASoC: stm32: i2s: fix dma configuration
	ASoC: stm32: i2s: fix 16 bit format support
	ASoC: stm32: i2s: fix IRQ clearing
	ASoC: stm32: sai: add missing put_device()
	dmaengine: stm32-dma: check whether length is aligned on FIFO threshold
	platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer
	platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size
	net: fec: fix clock count mis-match
	Linux 4.19.88

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ib17de804b245aa3da465ddc97270b8c7bd9fba66
2019-12-05 11:57:14 +01:00
Wei Yang
78ce155fb8 vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
[ Upstream commit 8b09549c2bfd9f3f8f4cdad74107ef4f4ff9cdd7 ]

Commit fa5e084e43 ("vmscan: do not unconditionally treat zones that
fail zone_reclaim() as full") changed the return value of
node_reclaim().  The original return value 0 means NODE_RECLAIM_SOME
after this commit.

While the return value of node_reclaim() when CONFIG_NUMA is n is not
changed.  This will leads to call zone_watermark_ok() again.

This patch fixes the return value by adjusting to NODE_RECLAIM_NOSCAN.
Since node_reclaim() is only called in page_alloc.c, move it to
mm/internal.h.

Link: http://lkml.kernel.org/r/20181113080436.22078-1-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-12-05 09:20:57 +01:00
Ivaylo Georgiev
1b74ac0833 Merge android-4.19.36 (10f41ccfc) into msm-4.19
* refs/heads/tmp-10f41ccfc:
  Linux 4.19.36
  appletalk: Fix compile regression
  mm: hide incomplete nr_indirectly_reclaimable in sysfs
  mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
  IB/hfi1: Failed to drain send queue when QP is put into error state
  bpf: fix use after free in bpf_evict_inode
  include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
  f2fs: fix to dirty inode for i_mode recovery
  rxrpc: Fix client call connect/disconnect race
  lib/div64.c: off by one in shift
  appletalk: Fix use-after-free in atalk_proc_exit
  drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
  ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
  drm/nouveau/volt/gf117: fix speedo readout register
  PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
  coresight: cpu-debug: Support for CA73 CPUs
  Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
  crypto: axis - fix for recursive locking from bottom half
  drm/panel: panel-innolux: set display off in innolux_panel_unprepare
  lkdtm: Add tests for NULL pointer dereference
  lkdtm: Print real addresses
  soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
  scsi: core: Avoid that system resume triggers a kernel warning
  iommu/dmar: Fix buffer overflow during PCI bus notification
  net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version
  crypto: sha512/arm - fix crash bug in Thumb2 build
  crypto: sha256/arm - fix crash bug in Thumb2 build
  xfrm: destroy xfrm_state synchronously on net exit path
  net/rds: fix warn in rds_message_alloc_sgs
  ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
  ALSA: hda: fix front speakers on Huawei MBXP
  drm/ttm: Fix bo_global and mem_global kfree error
  platform/x86: Add Intel AtomISP2 dummy / power-management driver
  kernel: hung_task.c: disable on suspend
  cifs: fallback to older infolevels on findfirst queryinfo retry
  net: stmmac: Set OWN bit for jumbo frames
  f2fs: cleanup dirty pages if recover failed
  netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine
  compiler.h: update definition of unreachable()
  KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
  HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2
  ACPI / SBS: Fix GPE storm on recent MacBookPro's
  usbip: fix vhci_hcd controller counting
  ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
  pinctrl: core: make sure strcmp() doesn't get a null parameter
  HID: i2c-hid: override HID descriptors for certain devices
  Bluetooth: Fix debugfs NULL pointer dereference
  media: au0828: cannot kfree dev before usb disconnect
  powerpc/pseries: Remove prrn_work workqueue
  serial: uartps: console_setup() can't be placed to init section
  netfilter: xt_cgroup: shrink size of v2 path
  f2fs: fix to do sanity check with current segment number
  ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx()
  9p locks: add mount option for lock retry interval
  9p: do not trust pdu content for stat item size
  f2fs: fix to avoid NULL pointer dereference on se->discard_map
  rsi: improve kernel thread handling to fix kernel panic
  gpio: pxa: handle corner case of unprobed device
  drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
  ext4: prohibit fstrim in norecovery mode
  x86/gart: Exclude GART aperture from kcore
  fix incorrect error code mapping for OBJECTID_NOT_FOUND
  x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
  iommu/vt-d: Check capability before disabling protected memory
  drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
  x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
  x86/hyperv: Prevent potential NULL pointer dereference
  x86/hpet: Prevent potential NULL pointer dereference
  irqchip/mbigen: Don't clear eventid when freeing an MSI
  irqchip/stm32: Don't clear rising/falling config registers at init
  drm/exynos/mixer: fix MIXER shadow registry synchronisation code
  blk-iolatency: #include "blk.h"
  PM / Domains: Avoid a potential deadlock
  ACPI / utils: Drop reference in test for device presence
  perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
  perf tests: Fix memory leak by expr__find_other() in test__expr()
  perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
  perf evsel: Free evsel->counts in perf_evsel__exit()
  perf hist: Add missing map__put() in error case
  perf top: Fix error handling in cmd_top()
  perf build-id: Fix memory leak in print_sdt_events()
  perf config: Fix a memory leak in collect_config()
  perf config: Fix an error in the config template documentation
  perf list: Don't forget to drop the reference to the allocated thread_map
  tools/power turbostat: return the exit status of a command
  x86/mm: Don't leak kernel addresses
  sched/core: Fix buffer overflow in cgroup2 property cpu.max
  sched/cpufreq: Fix 32-bit math overflow
  scsi: iscsi: flush running unbind operations when removing a session
  thermal/intel_powerclamp: fix truncated kthread name
  thermal/int340x_thermal: fix mode setting
  thermal/int340x_thermal: Add additional UUIDs
  thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
  thermal: samsung: Fix incorrect check after code merge
  thermal/intel_powerclamp: fix __percpu declaration of worker_data
  ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
  mmc: davinci: remove extraneous __init annotation
  i40iw: Avoid panic when handling the inetdev event
  IB/mlx4: Fix race condition between catas error reset and aliasguid flows
  drm/udl: use drm_gem_object_put_unlocked.
  auxdisplay: hd44780: Fix memory leak on ->remove()
  ALSA: sb8: add a check for request_region
  ALSA: echoaudio: add a check for ioremap_nocache
  ext4: report real fs size after failed resize
  ext4: add missing brelse() in add_new_gdb_meta_bg()
  ext4: avoid panic during forced reboot
  perf/core: Restore mmap record type correctly
  inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
  arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM
  ARC: u-boot args: check that magic number is correct
  ANDROID: cuttlefish_defconfig: Enable L2TP/PPTP
  ANDROID: Makefile: Properly resolve 4.19.35 merge
  Make arm64 serial port config compatible with crosvm

Conflicts:
	kernel/sched/cpufreq_schedutil.c

Change-Id: I8f049eb34344f72bf2d202c5e360f448771c78f4
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-05-16 05:06:14 -07:00
Greg Kroah-Hartman
10f41ccfc7 Merge 4.19.36 into android-4.19
Changes in 4.19.36
	ARC: u-boot args: check that magic number is correct
	arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM
	inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
	perf/core: Restore mmap record type correctly
	ext4: avoid panic during forced reboot
	ext4: add missing brelse() in add_new_gdb_meta_bg()
	ext4: report real fs size after failed resize
	ALSA: echoaudio: add a check for ioremap_nocache
	ALSA: sb8: add a check for request_region
	auxdisplay: hd44780: Fix memory leak on ->remove()
	drm/udl: use drm_gem_object_put_unlocked.
	IB/mlx4: Fix race condition between catas error reset and aliasguid flows
	i40iw: Avoid panic when handling the inetdev event
	mmc: davinci: remove extraneous __init annotation
	ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
	thermal/intel_powerclamp: fix __percpu declaration of worker_data
	thermal: samsung: Fix incorrect check after code merge
	thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
	thermal/int340x_thermal: Add additional UUIDs
	thermal/int340x_thermal: fix mode setting
	thermal/intel_powerclamp: fix truncated kthread name
	scsi: iscsi: flush running unbind operations when removing a session
	sched/cpufreq: Fix 32-bit math overflow
	sched/core: Fix buffer overflow in cgroup2 property cpu.max
	x86/mm: Don't leak kernel addresses
	tools/power turbostat: return the exit status of a command
	perf list: Don't forget to drop the reference to the allocated thread_map
	perf config: Fix an error in the config template documentation
	perf config: Fix a memory leak in collect_config()
	perf build-id: Fix memory leak in print_sdt_events()
	perf top: Fix error handling in cmd_top()
	perf hist: Add missing map__put() in error case
	perf evsel: Free evsel->counts in perf_evsel__exit()
	perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
	perf tests: Fix memory leak by expr__find_other() in test__expr()
	perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
	ACPI / utils: Drop reference in test for device presence
	PM / Domains: Avoid a potential deadlock
	blk-iolatency: #include "blk.h"
	drm/exynos/mixer: fix MIXER shadow registry synchronisation code
	irqchip/stm32: Don't clear rising/falling config registers at init
	irqchip/mbigen: Don't clear eventid when freeing an MSI
	x86/hpet: Prevent potential NULL pointer dereference
	x86/hyperv: Prevent potential NULL pointer dereference
	x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
	drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
	iommu/vt-d: Check capability before disabling protected memory
	x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
	fix incorrect error code mapping for OBJECTID_NOT_FOUND
	x86/gart: Exclude GART aperture from kcore
	ext4: prohibit fstrim in norecovery mode
	drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
	gpio: pxa: handle corner case of unprobed device
	rsi: improve kernel thread handling to fix kernel panic
	f2fs: fix to avoid NULL pointer dereference on se->discard_map
	9p: do not trust pdu content for stat item size
	9p locks: add mount option for lock retry interval
	ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx()
	f2fs: fix to do sanity check with current segment number
	netfilter: xt_cgroup: shrink size of v2 path
	serial: uartps: console_setup() can't be placed to init section
	powerpc/pseries: Remove prrn_work workqueue
	media: au0828: cannot kfree dev before usb disconnect
	Bluetooth: Fix debugfs NULL pointer dereference
	HID: i2c-hid: override HID descriptors for certain devices
	pinctrl: core: make sure strcmp() doesn't get a null parameter
	ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
	usbip: fix vhci_hcd controller counting
	ACPI / SBS: Fix GPE storm on recent MacBookPro's
	HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2
	KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
	compiler.h: update definition of unreachable()
	netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine
	f2fs: cleanup dirty pages if recover failed
	net: stmmac: Set OWN bit for jumbo frames
	cifs: fallback to older infolevels on findfirst queryinfo retry
	kernel: hung_task.c: disable on suspend
	platform/x86: Add Intel AtomISP2 dummy / power-management driver
	drm/ttm: Fix bo_global and mem_global kfree error
	ALSA: hda: fix front speakers on Huawei MBXP
	ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
	net/rds: fix warn in rds_message_alloc_sgs
	xfrm: destroy xfrm_state synchronously on net exit path
	crypto: sha256/arm - fix crash bug in Thumb2 build
	crypto: sha512/arm - fix crash bug in Thumb2 build
	net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version
	iommu/dmar: Fix buffer overflow during PCI bus notification
	scsi: core: Avoid that system resume triggers a kernel warning
	soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
	lkdtm: Print real addresses
	lkdtm: Add tests for NULL pointer dereference
	drm/panel: panel-innolux: set display off in innolux_panel_unprepare
	crypto: axis - fix for recursive locking from bottom half
	Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
	coresight: cpu-debug: Support for CA73 CPUs
	PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
	drm/nouveau/volt/gf117: fix speedo readout register
	ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
	drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
	appletalk: Fix use-after-free in atalk_proc_exit
	lib/div64.c: off by one in shift
	rxrpc: Fix client call connect/disconnect race
	f2fs: fix to dirty inode for i_mode recovery
	include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
	bpf: fix use after free in bpf_evict_inode
	IB/hfi1: Failed to drain send queue when QP is put into error state
	mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
	mm: hide incomplete nr_indirectly_reclaimable in sysfs
	appletalk: Fix compile regression
	Linux 4.19.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-20 15:53:36 +02:00
Greg Kroah-Hartman
9154c0c7e2 Merge 4.19.36 into android-4.19-q
Changes in 4.19.36
	ARC: u-boot args: check that magic number is correct
	arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM
	inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch()
	perf/core: Restore mmap record type correctly
	ext4: avoid panic during forced reboot
	ext4: add missing brelse() in add_new_gdb_meta_bg()
	ext4: report real fs size after failed resize
	ALSA: echoaudio: add a check for ioremap_nocache
	ALSA: sb8: add a check for request_region
	auxdisplay: hd44780: Fix memory leak on ->remove()
	drm/udl: use drm_gem_object_put_unlocked.
	IB/mlx4: Fix race condition between catas error reset and aliasguid flows
	i40iw: Avoid panic when handling the inetdev event
	mmc: davinci: remove extraneous __init annotation
	ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
	thermal/intel_powerclamp: fix __percpu declaration of worker_data
	thermal: samsung: Fix incorrect check after code merge
	thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
	thermal/int340x_thermal: Add additional UUIDs
	thermal/int340x_thermal: fix mode setting
	thermal/intel_powerclamp: fix truncated kthread name
	scsi: iscsi: flush running unbind operations when removing a session
	sched/cpufreq: Fix 32-bit math overflow
	sched/core: Fix buffer overflow in cgroup2 property cpu.max
	x86/mm: Don't leak kernel addresses
	tools/power turbostat: return the exit status of a command
	perf list: Don't forget to drop the reference to the allocated thread_map
	perf config: Fix an error in the config template documentation
	perf config: Fix a memory leak in collect_config()
	perf build-id: Fix memory leak in print_sdt_events()
	perf top: Fix error handling in cmd_top()
	perf hist: Add missing map__put() in error case
	perf evsel: Free evsel->counts in perf_evsel__exit()
	perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
	perf tests: Fix memory leak by expr__find_other() in test__expr()
	perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
	ACPI / utils: Drop reference in test for device presence
	PM / Domains: Avoid a potential deadlock
	blk-iolatency: #include "blk.h"
	drm/exynos/mixer: fix MIXER shadow registry synchronisation code
	irqchip/stm32: Don't clear rising/falling config registers at init
	irqchip/mbigen: Don't clear eventid when freeing an MSI
	x86/hpet: Prevent potential NULL pointer dereference
	x86/hyperv: Prevent potential NULL pointer dereference
	x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
	drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure
	iommu/vt-d: Check capability before disabling protected memory
	x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
	fix incorrect error code mapping for OBJECTID_NOT_FOUND
	x86/gart: Exclude GART aperture from kcore
	ext4: prohibit fstrim in norecovery mode
	drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up
	gpio: pxa: handle corner case of unprobed device
	rsi: improve kernel thread handling to fix kernel panic
	f2fs: fix to avoid NULL pointer dereference on se->discard_map
	9p: do not trust pdu content for stat item size
	9p locks: add mount option for lock retry interval
	ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx()
	f2fs: fix to do sanity check with current segment number
	netfilter: xt_cgroup: shrink size of v2 path
	serial: uartps: console_setup() can't be placed to init section
	powerpc/pseries: Remove prrn_work workqueue
	media: au0828: cannot kfree dev before usb disconnect
	Bluetooth: Fix debugfs NULL pointer dereference
	HID: i2c-hid: override HID descriptors for certain devices
	pinctrl: core: make sure strcmp() doesn't get a null parameter
	ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
	usbip: fix vhci_hcd controller counting
	ACPI / SBS: Fix GPE storm on recent MacBookPro's
	HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2
	KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail
	compiler.h: update definition of unreachable()
	netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine
	f2fs: cleanup dirty pages if recover failed
	net: stmmac: Set OWN bit for jumbo frames
	cifs: fallback to older infolevels on findfirst queryinfo retry
	kernel: hung_task.c: disable on suspend
	platform/x86: Add Intel AtomISP2 dummy / power-management driver
	drm/ttm: Fix bo_global and mem_global kfree error
	ALSA: hda: fix front speakers on Huawei MBXP
	ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle
	net/rds: fix warn in rds_message_alloc_sgs
	xfrm: destroy xfrm_state synchronously on net exit path
	crypto: sha256/arm - fix crash bug in Thumb2 build
	crypto: sha512/arm - fix crash bug in Thumb2 build
	net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version
	iommu/dmar: Fix buffer overflow during PCI bus notification
	scsi: core: Avoid that system resume triggers a kernel warning
	soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
	lkdtm: Print real addresses
	lkdtm: Add tests for NULL pointer dereference
	drm/panel: panel-innolux: set display off in innolux_panel_unprepare
	crypto: axis - fix for recursive locking from bottom half
	Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk"
	coresight: cpu-debug: Support for CA73 CPUs
	PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports
	drm/nouveau/volt/gf117: fix speedo readout register
	ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
	drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
	appletalk: Fix use-after-free in atalk_proc_exit
	lib/div64.c: off by one in shift
	rxrpc: Fix client call connect/disconnect race
	f2fs: fix to dirty inode for i_mode recovery
	include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
	bpf: fix use after free in bpf_evict_inode
	IB/hfi1: Failed to drain send queue when QP is put into error state
	mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo
	mm: hide incomplete nr_indirectly_reclaimable in sysfs
	appletalk: Fix compile regression
	Linux 4.19.36

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-04-20 15:47:09 +02:00
Pi-Hsun Shih
40c6d718d7 include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
[ Upstream commit a4046c06be50a4f01d435aa7fe57514818e6cc82 ]

Use offsetof() to calculate offset of a field to take advantage of
compiler built-in version when possible, and avoid UBSAN warning when
compiling with Clang:

  UBSAN: Undefined behaviour in mm/swapfile.c:3010:38
  member access within null pointer of type 'union swap_header'
  CPU: 6 PID: 1833 Comm: swapon Tainted: G S                4.19.23 #43
  Call trace:
   dump_backtrace+0x0/0x194
   show_stack+0x20/0x2c
   __dump_stack+0x20/0x28
   dump_stack+0x70/0x94
   ubsan_epilogue+0x14/0x44
   ubsan_type_mismatch_common+0xf4/0xfc
   __ubsan_handle_type_mismatch_v1+0x34/0x54
   __se_sys_swapon+0x654/0x1084
   __arm64_sys_swapon+0x1c/0x24
   el0_svc_common+0xa8/0x150
   el0_svc_compat_handler+0x2c/0x38
   el0_svc_compat+0x8/0x18

Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org
Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-04-20 09:16:05 +02:00
Laurent Dufour
808c47e1d5 mm: introduce __lru_cache_add_active_or_unevictable
The speculative page fault handler which is run without holding the
mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags
is not guaranteed to remain constant.
Introducing __lru_cache_add_active_or_unevictable() which has the vma flags
value parameter instead of the vma pointer.

Change-Id: I68decbe0f80847403127c45c97565e47512532e9
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:20
[vinmenon@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
[charante@codeaurora.org: trivial merge conflict fixes]
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2019-03-29 03:09:25 -07:00
Johannes Weiner
b4a56abdf7 UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place
thrashing.  Knowing the difference between the two has a range of
applications, including measuring the impact of memory shortage on the
system performance, as well as the ability to smarter balance pressure
between the filesystem cache and the swap-backed workingset.

During workingset transitions, inactive cache refaults and pushes out
established active cache.  When that active cache isn't stale, however,
and also ends up refaulting, that's bonafide thrashing.

Introduce a new page flag that tells on eviction whether the page has been
active or not in its lifetime.  This bit is then stored in the shadow
entry, to classify refaults as transitioning or thrashing.

How many page->flags does this leave us with on 32-bit?

	20 bits are always page flags

	21 if you have an MMU

	23 with the zone bits for DMA, Normal, HighMem, Movable

	29 with the sparsemem section bits

	30 if PAE is enabled

	31 with this patch.

So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
that's not enough, the system can switch to discontigmem and re-gain the 6
or 7 sparsemem section bits.

Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

(cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054)

Bug: 127712811
Test: lmkd in PSI mode
Change-Id: I71df060dce5590a3c654f9a0e8e54deeb74b64c2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-21 16:25:26 -07:00
Johannes Weiner
494c0dafc7 mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place
thrashing.  Knowing the difference between the two has a range of
applications, including measuring the impact of memory shortage on the
system performance, as well as the ability to smarter balance pressure
between the filesystem cache and the swap-backed workingset.

During workingset transitions, inactive cache refaults and pushes out
established active cache.  When that active cache isn't stale, however,
and also ends up refaulting, that's bonafide thrashing.

Introduce a new page flag that tells on eviction whether the page has been
active or not in its lifetime.  This bit is then stored in the shadow
entry, to classify refaults as transitioning or thrashing.

How many page->flags does this leave us with on 32-bit?

	20 bits are always page flags

	21 if you have an MMU

	23 with the zone bits for DMA, Normal, HighMem, Movable

	29 with the sparsemem section bits

	30 if PAE is enabled

	31 with this patch.

So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes.  If
that's not enough, the system can switch to discontigmem and re-gain the 6
or 7 sparsemem section bits.

Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Drake <drake@endlessm.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I6134c51987d401584f513f8b5a7826962009997c
Git-commit: 1899ad18c6072d689896badafb81267b0a1092a4
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.19
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2019-03-07 13:59:11 -08:00
Ivaylo Georgiev
45991f3d2a Merge android-4.19.18 (26bf816) into msm-4.19
* refs/heads/tmp-26bf816:
  Linux 4.19.18
  ipmi: Don't initialize anything in the core until something uses it
  ipmi:ssif: Fix handling of multi-part return messages
  ipmi: Prevent use-after-free in deliver_response
  ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
  ipmi: fix use-after-free of user->release_barrier.rda
  Bluetooth: Fix unnecessary error message for HCI request completion
  iwlwifi: mvm: Send LQ command as async when necessary
  mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
  userfaultfd: clear flag if remap event not enabled
  mm/swap: use nr_node_ids for avail_lists in swap_info_struct
  mm/page-writeback.c: don't break integrity writeback on ->writepage() error
  ocfs2: fix panic due to unrecovered local alloc
  iomap: don't search past page end in iomap_is_partially_uptodate
  scsi: megaraid: fix out-of-bound array accesses
  scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
  ath10k: fix peer stats null pointer dereference
  scsi: smartpqi: correct lun reset issues
  scsi: mpt3sas: fix memory ordering on 64bit writes
  IB/usnic: Fix potential deadlock
  sysfs: Disable lockdep for driver bind/unbind files
  ALSA: bebob: fix model-id of unit for Apogee Ensemble
  Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
  dm: Check for device sector overflow if CONFIG_LBDAF is not set
  clocksource/drivers/integrator-ap: Add missing of_node_put()
  quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
  perf tools: Add missing open_memstream() prototype for systems lacking it
  perf tools: Add missing sigqueue() prototype for systems lacking it
  perf cs-etm: Correct packets swapping in cs_etm__flush()
  dm snapshot: Fix excessive memory usage and workqueue stalls
  tools lib subcmd: Don't add the kernel sources to the include path
  perf stat: Avoid segfaults caused by negated options
  dm kcopyd: Fix bug causing workqueue stalls
  dm crypt: use u64 instead of sector_t to store iv_offset
  x86/topology: Use total_cpus for max logical packages calculation
  netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
  netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
  netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
  perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
  perf parse-events: Fix unchecked usage of strncpy()
  perf svghelper: Fix unchecked usage of strncpy()
  perf tests ARM: Disable breakpoint tests 32-bit
  perf intel-pt: Fix error with config term "pt=0"
  tty/serial: do not free trasnmit buffer page under port lock
  btrfs: improve error handling of btrfs_add_link
  btrfs: fix use-after-free due to race between replace start and cancel
  btrfs: alloc_chunk: fix more DUP stripe size handling
  btrfs: volumes: Make sure there is no overlap of dev extents at mount time
  mmc: atmel-mci: do not assume idle after atmci_request_end
  kconfig: fix memory leak when EOF is encountered in quotation
  kconfig: fix file name and line number of warn_ignored_character()
  bpf: relax verifier restriction on BPF_MOV | BPF_ALU
  arm64: Fix minor issues with the dcache_by_line_op macro
  clk: imx6q: reset exclusive gates on init
  arm64: kasan: Increase stack size for KASAN_EXTRA
  selftests: do not macro-expand failed assertion expressions
  scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
  scsi: target: use consistent left-aligned ASCII INQUIRY data
  net: call sk_dst_reset when set SO_DONTROUTE
  staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
  media: venus: core: Set dma maximum segment size
  ASoC: use dma_ops of parent device for acp_audio_dma
  media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
  powerpc/pseries/cpuidle: Fix preempt warning
  powerpc/xmon: Fix invocation inside lock region
  media: uvcvideo: Refactor teardown of uvc on USB disconnect
  pstore/ram: Do not treat empty buffers as valid
  clk: imx: make mux parent strings const
  jffs2: Fix use of uninitialized delayed_work, lockdep breakage
  efi/libstub: Disable some warnings for x86{,_64}
  rxe: IB_WR_REG_MR does not capture MR's iova field
  drm/amdgpu: Reorder uvd ring init before uvd resume
  scsi: qedi: Check for session online before getting iSCSI TLV data.
  ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
  selinux: always allow mounting submounts
  fpga: altera-cvp: fix probing for multiple FPGAs on the bus
  usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
  samples: bpf: fix: error handling regarding kprobe_events
  clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
  drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
  arm64: perf: set suppress_bind_attrs flag to true
  crypto: ecc - regularize scalar for scalar multiplication
  MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
  x86/mce: Fix -Wmissing-prototypes warnings
  ALSA: oxfw: add support for APOGEE duet FireWire
  bpf: Allow narrow loads with offset > 0
  serial: set suppress_bind_attrs flag only if builtin
  writeback: don't decrement wb->refcnt if !wb->bdi
  of: overlay: add missing of_node_put() after add new node to changeset
  selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
  usb: typec: tcpm: Do not disconnect link for self powered devices
  e1000e: allow non-monotonic SYSTIM readings
  platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
  ixgbe: allow IPsec Tx offload in VEPA mode
  drm/amdkfd: fix interrupt spin lock
  drm/amd/display: Guard against null stream_state in set_crc_source
  gpio: pl061: Move irq_chip definition inside struct pl061
  netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
  net: clear skb->tstamp in bridge forwarding path
  ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
  r8169: Add support for new Realtek Ethernet
  qmi_wwan: add MTU default to qmap network interface
  net, skbuff: do not prefer skb allocation fails early
  net: dsa: mv88x6xxx: mv88e6390 errata
  mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
  mlxsw: spectrum: Disable lag port TX before removing it
  ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
  UPSTREAM: dm: do not allow readahead to limit IO size
  UPSTREAM: ppp: Move PFC decompression to PPP generic layer
  UPSTREAM: l2tp: Add protocol field decompression

Conflicts:
	include/linux/swap.h

Change-Id: I72fecd7d04341015f9397afe60d8469d548b4cb6
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
2019-02-27 02:23:25 -08:00
Aaron Lu
b0cd52e644 mm/swap: use nr_node_ids for avail_lists in swap_info_struct
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]

Since a2468cc9bf ("swap: choose swap device according to numa node"),
avail_lists field of swap_info_struct is changed to an array with
MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
and needs an order-4 page to hold it.

This is not optimal in that:
1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
  is a waste of memory;
2 It could cause swapon failure if the swap device is swapped on
  after system has been running for a while, due to no order-4
  page is available as pointed out by Vasily Averin.

Solve the above two issues by using nr_node_ids(which is the actual
possible node number the running system has) for avail_lists instead of
MAX_NUMNODES.

nr_node_ids is unknown at compile time so can't be directly used when
declaring this array.  What I did here is to declare avail_lists as zero
element array and allocate space for it when allocating space for
swap_info_struct.  The reason why keep using array but not pointer is
plist_for_each_entry needs the field to be part of the struct, so pointer
will not work.

This patch is on top of Vasily Averin's fix commit.  I think the use of
kvzalloc for swap_info_struct is still needed in case nr_node_ids is
really big on some systems.

Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-01-26 09:32:43 +01:00
Vinayak Menon
ac4b2c42c8 mm: swap: swap ratio support
Add support to receive a static ratio from userspace to
divide the swap pages between ZRAM and disk based swap
devices. The existing infrastructure allows to keep
same priority for multiple swap devices, which results
in round robin distribution of pages. With this patch,
the ratio can be defined.

Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
[swatsrid@codeaurora.org: Fix trivial merge conflicts]
Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org>
2018-10-16 10:35:44 -07:00
Huang Ying
5d5e8f1954 mm, swap, get_swap_pages: use entry_size instead of cluster in parameter
As suggested by Matthew Wilcox, it is better to use "int entry_size"
instead of "bool cluster" as parameter to specify whether to operate for
huge or normal swap entries.  Because this improve the flexibility to
support other swap entry size.  And Dave Hansen thinks that this
improves code readability too.

So in this patch, the "bool cluster" parameter of get_swap_pages() is
replaced by "int entry_size".

And nr_swap_entries() trick is used to reduce the binary size when
!CONFIG_TRANSPARENT_HUGE_PAGE.

       text	   data	    bss	    dec	    hex	filename
base  24215	   2028	    340	  26583	   67d7	mm/swapfile.o
head  24123	   2004	    340	  26467	   6763	mm/swapfile.o

Link: http://lkml.kernel.org/r/20180720071845.17920-7-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-22 10:52:44 -07:00
Tejun Heo
2cf855837b memcontrol: schedule throttling if we are congested
Memory allocations can induce swapping via kswapd or direct reclaim.  If
we are having IO done for us by kswapd and don't actually go into direct
reclaim we may never get scheduled for throttling.  So instead check to
see if our cgroup is congested, and if so schedule the throttling.
Before we return to user space the throttling stuff will only throttle
if we actually required it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-07-09 09:07:54 -06:00
Jonathan Corbet
24844fd339 Merge branch 'mm-rst' into docs-next
Mike Rapoport says:

  These patches convert files in Documentation/vm to ReST format, add an
  initial index and link it to the top level documentation.

  There are no contents changes in the documentation, except few spelling
  fixes. The relatively large diffstat stems from the indentation and
  paragraph wrapping changes.

  I've tried to keep the formatting as consistent as possible, but I could
  miss some places that needed markup and add some markup where it was not
  necessary.

[jc: significant conflicts in vm/hmm.rst]
2018-04-16 14:25:08 -06:00
Mike Rapoport
ad56b738c5 docs/vm: rename documentation files to .rst
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2018-04-16 14:18:15 -06:00
Minchan Kim
e9e9b7ecee mm: swap: unify cluster-based and vma-based swap readahead
This patch makes do_swap_page() not need to be aware of two different
swap readahead algorithms.  Just unify cluster-based and vma-based
readahead function call.

Link: http://lkml.kernel.org/r/1509520520-32367-3-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180220085249.151400-3-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:25 -07:00
Minchan Kim
eaf649ebc3 mm: swap: clean up swap readahead
When I see recent change of swap readahead, I am very unhappy about
current code structure which diverges two swap readahead algorithm in
do_swap_page.  This patch is to clean it up.

Main motivation is that fault handler doesn't need to be aware of
readahead algorithms but just should call swapin_readahead.

As first step, this patch cleans up a little bit but not perfect (I just
separate for review easier) so next patch will make the goal complete.

[minchan@kernel.org: do not check readahead flag with THP anon]
  Link: http://lkml.kernel.org/r/874lm83zho.fsf@yhuang-dev.intel.com
  Link: http://lkml.kernel.org/r/20180227232611.169883-1-minchan@kernel.org
Link: http://lkml.kernel.org/r/1509520520-32367-2-git-send-email-minchan@kernel.org
Link: http://lkml.kernel.org/r/20180220085249.151400-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05 21:36:25 -07:00
Shakeel Butt
9c4e6b1a70 mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU.  Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.

The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.

This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability.  This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU.  Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.

However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().

	#0: __pagevec_lru_add_fn	#1: clear_page_mlock

	SetPageLRU()			if (!TestClearPageMlocked())
					  return
	smp_mb() // <--required
					// inside does PageLRU
	if (!PageMlocked())		if (isolate_lru_page())
	  move to evictable LRU		  putback_lru_page()
	else
	  move to unevictable LRU

In '#1', TestClearPageMlocked() provides full memory barrier semantics
and thus the PageLRU check (inside isolate_lru_page) can not be
reordered before it.

In '#0', without explicit memory barrier, the PageMlocked() check can be
reordered before SetPageLRU().  If that happens, '#0' can put a page in
unevictable LRU and '#1' might have just cleared the Mlocked bit of that
page but fails to isolate as PageLRU fails as '#0' still hasn't set
PageLRU bit of that page.  That page will be stranded on the unevictable
LRU.

There is one (good) side effect though.  Without this patch, the pages
allocated for System V shared memory segment are added to evictable LRUs
even after shmctl(SHM_LOCK) on that segment.  This patch will correctly
put such pages to unevictable LRU.

Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-21 15:35:42 -08:00
Jan Kara
a4ef876841 mm: remove unused pgdat_reclaimable_pages()
Remove unused function pgdat_reclaimable_pages() and
node_page_state_snapshot() which becomes unused as well.

Link: http://lkml.kernel.org/r/20171122094416.26019-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:37 -08:00
Michal Hocko
9852a72123 mm: drop hotplug lock from lru_add_drain_all()
Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this.  While the usage in that particular case might be
wrong we should avoid the locking as lru_add_drain_all() is used in many
places.  It seems that this is not all that hard to achieve actually.

We have done the same thing for drain_all_pages which is analogous by
commit a459eeb7b8 ("mm, page_alloc: do not depend on cpu hotplug locks
inside the allocator").  All we have to care about is to handle

      - the work item might be executed on a different cpu in worker from
        unbound pool so it doesn't run on pinned on the cpu

      - we have to make sure that we do not race with page_alloc_cpu_dead
        calling lru_add_drain_cpu

the first part is already handled because the worker calls lru_add_drain
which disables preemption when calling lru_add_drain_cpu on the local
cpu it is draining.  The later is true because page_alloc_cpu_dead is
called on the controlling CPU after the hotplugged CPU vanished
completely.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

[add a cpu hotplug locking interaction as per tglx]
Link: http://lkml.kernel.org/r/20171116120535.23765-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:36 -08:00
Mel Gorman
c6f92f9fbe mm: remove cold parameter for release_pages
All callers of release_pages claim the pages being released are cache
hot.  As no one cares about the hotness of pages being released to the
allocator, just ditch the parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-7-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Mel Gorman
c7df8ad291 mm, truncate: do not check mapping for every page being truncated
During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.

This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called.  The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                             oneirq-v1r1        pickhelper-v1r1
Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)

As you'd expect, the gain is marginal but it can be detected.  The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field.  The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.

Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Minchan Kim
aa8d22a11d mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference
When SWP_SYNCHRONOUS_IO swapped-in pages are shared by several
processes, it can cause unnecessary memory wastage by skipping swap
cache.  Because, with swapin fault by read, they could share a page if
the page were in swap cache.  Thus, it avoids allocating same content
new pages.

This patch makes the swapcache skipping work only if the swap pte is
non-sharable.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1507620825-5537-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
0bcac06f27 mm, swap: skip swapcache for swapin of synchronous device
With fast swap storage, the platforms want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page() based synchronous devices like zram, pmem and btt are such
fast storage.  When I profile swapin performance with zram lz4
decompress test, S/W overhead is more than 70%.  Maybe, it would be
bigger in nvdimm.

This patch aims to reduce swap-in latency by skipping swapcache if the
swap device is synchronous device like rw_page based device.  It
enhances 45% my swapin test(5G sequential swapin, no readahead, from
2.41sec to 1.64sec).

Link: http://lkml.kernel.org/r/1505886205-9671-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Minchan Kim
539a6fea7f mm, swap: introduce SWP_SYNCHRONOUS_IO
If rw-page based fast storage is used for swap devices, we need to
detect it to enhance swap IO operations.  This patch is preparation for
optimizing of swap-in operation with next patch.

Link: http://lkml.kernel.org/r/1505886205-9671-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:02 -08:00
Huang Ying
2628bd6fc0 mm, swap: fix race between swap count continuation operations
One page may store a set of entries of the sis->swap_map
(swap_info_struct->swap_map) in multiple swap clusters.

If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
multiple pages will be used to store the set of entries of the
sis->swap_map.  And the pages are linked with page->lru.  This is called
swap count continuation.  To access the pages which store the set of
entries of the sis->swap_map simultaneously, previously, sis->lock is
used.  But to improve the scalability of __swap_duplicate(), swap
cluster lock may be used in swap_count_continued() now.  This may race
with add_swap_count_continuation() which operates on a nearby swap
cluster, in which the sis->swap_map entries are stored in the same page.

The race can cause wrong swap count in practice, thus cause unfreeable
swap entries or software lockup, etc.

To fix the race, a new spin lock called cont_lock is added to struct
swap_info_struct to protect the swap count continuation page list.  This
is a lock at the swap device level, so the scalability isn't very well.
But it is still much better than the original sis->lock, because it is
only acquired/released when swap count continuation is used.  Which is
considered rare in practice.  If it turns out that the scalability
becomes an issue for some workloads, we can split the lock into some
more fine grained locks.

Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
Fixes: 235b621767 ("mm/swap: add cluster lock")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-03 07:39:19 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Jérôme Glisse
5042db43cc mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory.  Reasons for HMM and
migration to device memory is explained with HMM core patch.

This patch deals with device memory that is un-addressable memory (ie CPU
can not access it).  Hence we do not want those struct page to be manage
like regular memory.  That is why we extend ZONE_DEVICE to support
different types of memory.

A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type.  There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type.  All specific code
path are protect with test against the memory type.

Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).

The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.

The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU).  This callback is responsible to migrate the page back to system
main memory.  Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.

If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.

[arnd@arndb.de: fix warning]
  Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:46 -07:00
Aaron Lu
a2468cc9bf swap: choose swap device according to numa node
If the system has more than one swap device and swap device has the node
information, we can make use of this information to decide which swap
device to use in get_swap_pages() to get better performance.

The current code uses a priority based list, swap_avail_list, to decide
which swap device to use and if multiple swap devices share the same
priority, they are used round robin.  This patch changes the previous
single global swap_avail_list into a per-numa-node list, i.e.  for each
numa node, it sees its own priority based list of available swap
devices.  Swap device's priority can be promoted on its matching node's
swap_avail_list.

The current swap device's priority is set as: user can set a >=0 value,
or the system will pick one starting from -1 then downwards.  The
priority value in the swap_avail_list is the negated value of the swap
device's due to plist being sorted from low to high.  The new policy
doesn't change the semantics for priority >=0 cases, the previous
starting from -1 then downwards now becomes starting from -2 then
downwards and -1 is reserved as the promoted value.

Take 4-node EX machine as an example, suppose 4 swap devices are
available, each sit on a different node:
swapA on node 0
swapB on node 1
swapC on node 2
swapD on node 3

After they are all swapped on in the sequence of ABCD.

Current behaviour:
their priorities will be:
swapA: -1
swapB: -2
swapC: -3
swapD: -4
And their position in the global swap_avail_list will be:
swapA   -> swapB   -> swapC   -> swapD
prio:1     prio:2     prio:3     prio:4

New behaviour:
their priorities will be(note that -1 is skipped):
swapA: -2
swapB: -3
swapC: -4
swapD: -5
And their positions in the 4 swap_avail_lists[nid] will be:
swap_avail_lists[0]: /* node 0's available swap device list */
swapA   -> swapB   -> swapC   -> swapD
prio:1     prio:3     prio:4     prio:5
swap_avali_lists[1]: /* node 1's available swap device list */
swapB   -> swapA   -> swapC   -> swapD
prio:1     prio:2     prio:4     prio:5
swap_avail_lists[2]: /* node 2's available swap device list */
swapC   -> swapA   -> swapB   -> swapD
prio:1     prio:2     prio:3     prio:5
swap_avail_lists[3]: /* node 3's available swap device list */
swapD   -> swapA   -> swapB   -> swapC
prio:1     prio:2     prio:3     prio:4

To see the effect of the patch, a test that starts N process, each mmap
a region of anonymous memory and then continually write to it at random
position to trigger both swap in and out is used.

On a 2 node Skylake EP machine with 64GiB memory, two 170GB SSD drives
are used as swap devices with each attached to a different node, the
result is:

runtime=30m/processes=32/total test size=128G/each process mmap region=4G
kernel         throughput
vanilla        13306
auto-binding   15169 +14%

runtime=30m/processes=64/total test size=128G/each process mmap region=2G
kernel         throughput
vanilla        11885
auto-binding   14879 +25%

[aaron.lu@intel.com: v2]
  Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com
  Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com
[akpm@linux-foundation.org: use kmalloc_array()]
Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com
Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: "Chen, Tim C" <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
Huang Ying
81a0298bdf mm, swap: don't use VMA based swap readahead if HDD is used as swap
VMA based swap readahead will readahead the virtual pages that is
continuous in the virtual address space.  While the original swap
readahead will readahead the swap slots that is continuous in the swap
device.  Although VMA based swap readahead is more correct for the swap
slots to be readahead, it will trigger more small random readings, which
may cause the performance of HDD (hard disk) to degrade heavily, and may
finally exceed the benefit.

To avoid the issue, in this patch, if the HDD is used as swap, the VMA
based swap readahead will be disabled, and the original swap readahead
will be used instead.

Link: http://lkml.kernel.org/r/20170807054038.1843-6-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:30 -07:00
Huang Ying
ec560175c0 mm, swap: VMA based swap readahead
The swap readahead is an important mechanism to reduce the swap in
latency.  Although pure sequential memory access pattern isn't very
popular for anonymous memory, the space locality is still considered
valid.

In the original swap readahead implementation, the consecutive blocks in
swap device are readahead based on the global space locality estimation.
But the consecutive blocks in swap device just reflect the order of page
reclaiming, don't necessarily reflect the access pattern in virtual
memory.  And the different tasks in the system may have different access
patterns, which makes the global space locality estimation incorrect.

In this patch, when page fault occurs, the virtual pages near the fault
address will be readahead instead of the swap slots near the fault swap
slot in swap device.  This avoid to readahead the unrelated swap slots.
At the same time, the swap readahead is changed to work on per-VMA from
globally.  So that the different access patterns of the different VMAs
could be distinguished, and the different readahead policy could be
applied accordingly.  The original core readahead detection and scaling
algorithm is reused, because it is an effect algorithm to detect the
space locality.

The test and result is as follow,

Common test condition
=====================

Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM) Swap device:
NVMe disk

Micro-benchmark with combined access pattern
============================================

vm-scalability, sequential swap test case, 4 processes to eat 50G
virtual memory space, repeat the sequential memory writing until 300
seconds.  The first round writing will trigger swap out, the following
rounds will trigger sequential swap in and out.

At the same time, run vm-scalability random swap test case in
background, 8 processes to eat 30G virtual memory space, repeat the
random memory write until 300 seconds.  This will trigger random swap-in
in the background.

This is a combined workload with sequential and random memory accessing
at the same time.  The result (for sequential workload) is as follow,

			Base		Optimized
			----		---------
throughput		345413 KB/s	414029 KB/s (+19.9%)
latency.average		97.14 us	61.06 us (-37.1%)
latency.50th		2 us		1 us
latency.60th		2 us		1 us
latency.70th		98 us		2 us
latency.80th		160 us		2 us
latency.90th		260 us		217 us
latency.95th		346 us		369 us
latency.99th		1.34 ms		1.09 ms
ra_hit%			52.69%		99.98%

The original swap readahead algorithm is confused by the background
random access workload, so readahead hit rate is lower.  The VMA-base
readahead algorithm works much better.

Linpack
=======

The test memory size is bigger than RAM to trigger swapping.

			Base		Optimized
			----		---------
elapsed_time		393.49 s	329.88 s (-16.2%)
ra_hit%			86.21%		98.82%

The score of base and optimized kernel hasn't visible changes.  But the
elapsed time reduced and readahead hit rate improved, so the optimized
kernel runs better for startup and tear down stages.  And the absolute
value of readahead hit rate is high, shows that the space locality is
still valid in some practical workloads.

Link: http://lkml.kernel.org/r/20170807054038.1843-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:29 -07:00
Michal Hocko
c41f012ade mm: rename global_page_state to global_zone_page_state
global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests.  We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:

$ git grep "global_page_state(NR_" | sed 's@.*(\(NR_[A-Z_]*\)).*@\1@' | sort | uniq -c
      2 NR_BOUNCE
      2 NR_FREE_CMA_PAGES
     11 NR_FREE_PAGES
      1 NR_KERNEL_STACK_KB
      1 NR_MLOCK
      2 NR_PAGETABLE

This patch shouldn't introduce any functional change.

[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp

Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:29 -07:00
Huang Ying
59807685a7 mm, THP, swap: support splitting THP for THP swap out
After adding swapping out support for THP (Transparent Huge Page), it is
possible that a THP in swap cache (partly swapped out) need to be split.
To split such a THP, the swap cluster backing the THP need to be split
too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared for the swap
cluster.  The patch implemented this.

And because the THP swap writing needs the THP keeps as huge page during
writing.  The PageWriteback flag is checked before splitting.

Link: http://lkml.kernel.org/r/20170724051840.2309-8-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:28 -07:00
Huang Ying
ba3c4ce6de mm, THP, swap: make reuse_swap_page() works for THP swapped out
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP are
turned into swap entries.  So reuse_swap_page() need to check the swap
count in addition to the map count as before.  This patch done that.

In the huge PMD write protect fault handler, in addition to the page map
count, the swap count need to be checked too, so the page lock need to
be acquired too when calling reuse_swap_page() in addition to the page
table lock.

[ying.huang@intel.com: silence a compiler warning]
  Link: http://lkml.kernel.org/r/87bmnzizjy.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20170724051840.2309-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Huang Ying
e07098294a mm, THP, swap: support to reclaim swap space for THP swapped out
The normal swap slot reclaiming can be done when the swap count reaches
SWAP_HAS_CACHE.  But for the swap slot which is backing a THP, all swap
slots backing one THP must be reclaimed together, because the swap slot
may be used again when the THP is swapped out again later.  So the swap
slots backing one THP can be reclaimed together when the swap count for
all swap slots for the THP reached SWAP_HAS_CACHE.  In the patch, the
functions to check whether the swap count for all swap slots backing one
THP reached SWAP_HAS_CACHE are implemented and used when checking
whether a swap slot can be reclaimed.

To make it easier to determine whether a swap slot is backing a THP, a
new swap cluster flag named CLUSTER_FLAG_HUGE is added to mark a swap
cluster which is backing a THP (Transparent Huge Page).  Because THP
swap in as a whole isn't supported now.  After deleting the THP from the
swap cache (for example, swapping out finished), the CLUSTER_FLAG_HUGE
flag will be cleared.  So that, the normal pages inside THP can be
swapped in individually.

[ying.huang@intel.com: fix swap_page_trans_huge_swapped on HDD]
  Link: http://lkml.kernel.org/r/874ltsm0bi.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20170724051840.2309-3-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:27 -07:00
Thomas Gleixner
a47fed5b5b mm: swap: provide lru_add_drain_all_cpuslocked()
The rework of the cpu hotplug locking unearthed potential deadlocks with
the memory hotplug locking code.

The solution for these is to rework the memory hotplug locking code as
well and take the cpu hotplug lock before the memory hotplug lock in
mem_hotplug_begin(), but this will cause a recursive locking of the cpu
hotplug lock when the memory hotplug code calls lru_add_drain_all().

Split out the inner workings of lru_add_drain_all() into
lru_add_drain_all_cpuslocked() so this function can be invoked from the
memory hotplug code with the cpu hotplug lock held.

Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:33 -07:00
Shaohua Li
23955622ff swap: add block io poll in swapin path
For fast flash disk, async IO could introduce overhead because of
context switch.  block-mq now supports IO poll, which improves
performance and latency a lot.  swapin is a good place to use this
technique, because the task is waiting for the swapin page to continue
execution.

In my virtual machine, directly read 4k data from a NVMe with iopoll is
about 60% better than that without poll.  With iopoll support in swapin
patch, my microbenchmark (a task does random memory write) is about
10%~25% faster.  CPU utilization increases a lot though, 2x and even 3x
CPU utilization.  This will depend on disk speed.

While iopoll in swapin isn't intended for all usage cases, it's a win
for latency sensistive workloads with high speed swap disk.  block layer
has knob to control poll in runtime.  If poll isn't enabled in block
layer, there should be no noticeable change in swapin.

I got a chance to run the same test in a NVMe with DRAM as the media.
In simple fio IO test, blkpoll boosts 50% performance in single thread
test and ~20% in 8 threads test.  So this is the base line.  In above
swap test, blkpoll boosts ~27% performance in single thread test.
blkpoll uses 2x CPU time though.

If we enable hybid polling, the performance gain has very slight drop
but CPU time is only 50% worse than that without blkpoll.  Also we can
adjust parameter of hybid poll, with it, the CPU time penality is
reduced further.  In 8 threads test, blkpoll doesn't help though.  The
performance is similar to that without blkpoll, but cpu utilization is
similar too.  There is lock contention in swap path.  The cpu time
spending on blkpoll isn't high.  So overall, blkpoll swapin isn't worse
than that without it.

The swapin readahead might read several pages in in the same time and
form a big IO request.  Since the IO will take longer time, it doesn't
make sense to do poll, so the patch only does iopoll for single page
swapin.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:30 -07:00
Minchan Kim
0f0746589e mm, THP, swap: move anonymous THP split logic to vmscan
The add_to_swap aims to allocate swap_space(ie, swap slot and swapcache)
so if it fails due to lack of space in case of THP or something(hdd swap
but tries THP swapout) *caller* rather than add_to_swap itself should
split the THP page and retry it with base page which is more natural.

Link: http://lkml.kernel.org/r/20170515112522.32457-4-ying.huang@intel.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:31 -07:00
Minchan Kim
75f6d6d29a mm, THP, swap: unify swap slot free functions to put_swap_page
Now, get_swap_page takes struct page and allocates swap space according
to page size(ie, normal or THP) so it would be more cleaner to introduce
put_swap_page which is a counter function of get_swap_page.  Then, it
calls right swap slot free function depending on page's size.

[ying.huang@intel.com: minor cleanup and fix]
Link: http://lkml.kernel.org/r/20170515112522.32457-3-ying.huang@intel.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:31 -07:00
Huang Ying
38d8b4e6bd mm, THP, swap: delay splitting THP during swap out
Patch series "THP swap: Delay splitting THP during swapping out", v11.

This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.

Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

 - Batch the swap operations for the THP to reduce lock
   acquiring/releasing, including allocating/freeing the swap space,
   adding/deleting to/from the swap cache, and writing/reading the swap
   space, etc. This will help improve the performance of the THP swap.

 - The THP swap space read/write will be 2M sequential IO. It is
   particularly helpful for the swap read, which are usually 4k random
   IO. This will improve the performance of the THP swap too.

 - It will help the memory fragmentation, especially when the THP is
   heavily used by the applications. The 2M continuous pages will be
   free up after THP swapping out.

 - It will improve the THP utilization on the system with the swap
   turned on. Because the speed for khugepaged to collapse the normal
   pages into the THP is quite slow. After the THP is split during the
   swapping out, it will take quite long time for the normal pages to
   collapse back into the THP after being swapped in. The high THP
   utilization helps the efficiency of the page based memory management
   too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be turned
on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is the first step for the THP swap support.  The plan is
to delay splitting THP step by step, finally avoid splitting THP during
the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is delayed
from almost the first step of swapping out to after allocating the swap
space for the THP and adding the THP into the swap cache.  This will
reduce lock acquiring/releasing for the locks used for the swap cache
management.

With the patchset, the swap out throughput improves 15.5% (from about
3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To test
the sequential swapping out, the test case creates 8 processes, which
sequentially allocate and write to the anonymous pages until the RAM and
part of the swap device is used up.

This patch (of 5):

In this patch, splitting huge page is delayed from almost the first step
of swapping out to after allocating the swap space for the THP
(Transparent Huge Page) and adding the THP into the swap cache.  This
will batch the corresponding operation, thus improve THP swap out
throughput.

This is the first step for the THP swap optimization.  The plan is to
delay splitting the THP step by step and avoid splitting the THP
finally.

In this patch, one swap cluster is used to hold the contents of each THP
swapped out.  So, the size of the swap cluster is changed to that of the
THP (Transparent Huge Page) on x86_64 architecture (512).  For other
architectures which want such THP swap optimization,
ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for
the architecture.  In effect, this will enlarge swap cluster size by 2
times on x86_64.  Which may make it harder to find a free cluster when
the swap space becomes fragmented.  So that, this may reduce the
continuous swap space allocation and sequential write in theory.  The
performance test in 0day shows no regressions caused by this.

In the future of THP swap optimization, some information of the swapped
out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

The mem cgroup swap accounting functions are enhanced to support charge
or uncharge a swap cluster backing a THP as a whole.

The swap cluster allocate/free functions are added to allocate/free a
swap cluster for a THP.  A fair simple algorithm is used for swap
cluster allocation, that is, only the first swap device in priority list
will be tried to allocate the swap cluster.  The function will fail if
the trying is not successful, and the caller will fallback to allocate a
single swap slot instead.  This works good enough for normal cases.  If
the difference of the number of the free swap clusters among multiple
swap devices is significant, it is possible that some THPs are split
earlier than necessary.  For example, this could be caused by big size
difference among multiple swap devices.

The swap cache functions is enhanced to support add/delete THP to/from
the swap cache as a set of (HPAGE_PMD_NR) sub-pages.  This may be
enhanced in the future with multi-order radix tree.  But because we will
split the THP soon during swapping out, that optimization doesn't make
much sense for this first step.

The THP splitting functions are enhanced to support to split THP in swap
cache during swapping out.  The page lock will be held during allocating
the swap cluster, adding the THP into the swap cache and splitting the
THP.  So in the code path other than swapping out, if the THP need to be
split, the PageSwapCache(THP) will be always false.

The swap cluster is only available for SSD, so the THP swap optimization
in this patchset has no effect for HDD.

[ying.huang@intel.com: fix two issues in THP optimize patch]
  Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com
[hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size]
Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option]
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:31 -07:00
Huang Ying
df6b749980 mm, swap: remove unused function prototype
This is a code cleanup patch, no functionality changes.  There are 2
unused function prototype in swap.h, they are removed.

Link: http://lkml.kernel.org/r/20170405071017.23677-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:11 -07:00
Shaohua Li
f7ad2a6cb9 mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
madv()'s MADV_FREE indicate pages are 'lazyfree'.  They are still
anonymous pages, but they can be freed without pageout.  To distinguish
these from normal anonymous pages, we clear their SwapBacked flag.

MADV_FREE pages could be freed without pageout, so they pretty much like
used once file pages.  For such pages, we'd like to reclaim them once
there is memory pressure.  Also it might be unfair reclaiming MADV_FREE
pages always before used once file pages and we definitively want to
reclaim the pages before other anonymous and file pages.

To speed up MADV_FREE pages reclaim, we put the pages into
LRU_INACTIVE_FILE list.  The rationale is LRU_INACTIVE_FILE list is tiny
nowadays and should be full of used once file pages.  Reclaiming
MADV_FREE pages will not have much interfere of anonymous and active
file pages.  And the inactive file pages and MADV_FREE pages will be
reclaimed according to their age, so we don't reclaim too many MADV_FREE
pages too.  Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
means we can reclaim the pages without swap support.  This idea is
suggested by Johannes.

This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
avoid bisect failure, next patch will do it.

The patch is based on Minchan's original patch.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com
Signed-off-by: Shaohua Li <shli@fb.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:08 -07:00
Tim Chen
67afa38e01 mm/swap: add cache for swap slots allocation
We add per cpu caches for swap slots that can be allocated and freed
quickly without the need to touch the swap info lock.

Two separate caches are maintained for swap slots allocated and swap
slots returned.  This is to allow the swap slots to be returned to the
global pool in a batch so they will have a chance to be coaelesced with
other slots in a cluster.  We do not reuse the slots that are returned
right away, as it may increase fragmentation of the slots.

The swap allocation cache is protected by a mutex as we may sleep when
searching for empty slots in cache.  The swap free cache is protected by
a spin lock as we cannot sleep in the free path.

We refill the swap slots cache when we run out of slots, and we disable
the swap slots cache and drain the slots if the global number of slots
fall below a low watermark threshold.  We re-enable the cache agian when
the slots available are above a high watermark.

[ying.huang@intel.com: use raw_cpu_ptr over this_cpu_ptr for swap slots access]
[tim.c.chen@linux.intel.com: add comments on locks in swap_slots.h]
  Link: http://lkml.kernel.org/r/20170118180327.GA24225@linux.intel.com
Link: http://lkml.kernel.org/r/35de301a4eaa8daa2977de6e987f2c154385eb66.1484082593.git.tim.c.chen@linux.intel.com
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net> escreveu:
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Tim Chen
7c00bafee8 mm/swap: free swap slots in batch
Add new functions that free unused swap slots in batches without the
need to reacquire swap info lock.  This improves scalability and reduce
lock contention.

Link: http://lkml.kernel.org/r/c25e0fcdfd237ec4ca7db91631d3b9f6ed23824e.1484082593.git.tim.c.chen@linux.intel.com
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net> escreveu:
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00