Commit Graph

751 Commits

Author SHA1 Message Date
Blagovest Kolenichev
a8a3aff106 Merge android-4.9.86 (b324a70) into msm-4.9
* refs/heads/tmp-b324a70:
  Linux 4.9.86
  MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
  KVM: arm/arm64: Fix check for hugepage size when allocating at Stage 2
  net: gianfar_ptp: move set_fipers() to spinlock protecting area
  sctp: make use of pre-calculated len
  xen/gntdev: Fix partial gntdev_mmap() cleanup
  xen/gntdev: Fix off-by-one error when unmapping with holes
  SolutionEngine771x: fix Ether platform data
  mdio-sun4i: Fix a memory leak
  xen-netfront: enable device after manual module load
  bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
  can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
  mac80211: mesh: drop frames appearing to be from us
  nl80211: Check for the required netlink attribute presence
  i40e/i40evf: Account for frags split over multiple descriptors in check linearize
  uapi libc compat: add fallback for unsupported libcs
  drm/ttm: check the return value of kzalloc
  NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
  e1000: fix disabling already-disabled warning
  macvlan: Fix one possible double free
  xfs: quota: check result of register_shrinker()
  xfs: quota: fix missed destroy of qi_tree_lock
  IB/ipoib: Fix race condition in neigh creation
  IB/mlx4: Fix mlx4_ib_alloc_mr error flow
  s390/dasd: fix wrongly assigned configuration data
  genirq: Guard handle_bad_irq log messages
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  led: core: Fix brightness setting when setting delay_off=0
  bnx2x: Improve reliability in case of nested PCI errors
  tg3: Enable PHY reset in MTU change path for 5720
  tg3: Add workaround to restrict 5762 MRRS to 2048
  tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
  tipc: error path leak fixes in tipc_enable_bearer()
  lib/mpi: Fix umul_ppmm() for MIPS64r6
  ARM: dts: ls1021a: fix incorrect clock references
  scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
  net: stmmac: Fix TX timestamp calculation
  ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
  net: arc_emac: fix arc_emac_rx() error paths
  net: mediatek: setup proper state for disabled GMAC on the default
  ASoC: nau8825: fix issue that pop noise when start capture
  spi: atmel: fixed spin_lock usage inside atmel_spi_remove
  mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
  drm/nouveau/pci: do a msi rearm on init
  net: phy: xgene: disable clk on error paths
  sget(): handle failures of register_shrinker()
  x86/asm: Allow again using asm.h when building for the 'bpf' clang target
  ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
  ipv6: icmp6: Allow icmp messages to be looped back
  mtd: nand: brcmnand: Zero bitflip is not an error
  mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
  net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
  nvme: check hw sectors before setting chunk sectors
  dmaengine: fsl-edma: disable clks on all error paths
  f2fs: fix a bug caused by NULL extent tree
  i2c: designware: must wait for enable
  hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
  ANDROID: kbuild: change LTO into a choice
  ANDROID: arm64: crypto: fix AES CE when built as a module
  ANDROID: staging: lustre: fix filler function type
  ANDROID: fs: logfs: fix filler function type
  ANDROID: fs: gfs2: fix filler function type
  ANDROID: fs: exofs: fix filler function type
  ANDROID: fs: afs: fix filler function type
  ANDROID: keychord: Check for write data size
  media-device: fix ioctl function types
  drivers/perf: arm_pmu: fix function type mismatch
  dummycon: fix function types
  fs: nfs: fix filler function type
  mm: fix filler function type mismatch
  mm: fix drain_local_pages function type
  BACKPORT: vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
  arch/arm64/crypto: fix CFI in AES CE
  arch/arm64/crypto: fix CFI in SHA CE
  arm64: disable CFI for cpu_replace_ttbr1
  v4l2-ioctl: fix function types for IOCTL_INFO_STD
  UPSTREAM: module: Do not paper over type mismatches in module_param_call()
  BACKPORT: treewide: Fix function prototypes for module_param_call()
  UPSTREAM: module: Prepare to convert all module_param_call() prototypes
  bpf: fix function type for __bpf_prog_run
  kallsyms: strip the .cfi postfix from symbols with CONFIG_CFI_CLANG
  add support for clang Control Flow Integrity (CFI)
  HACK: init: ensure initcall ordering with LTO
  xen/efi: don't use -fshort-wchar
  drivers/misc: disable LTO for lkdtm_rodata.o
  arm64: vdso: disable LTO
  FROMLIST: BACKPORT: arm64: select ARCH_SUPPORTS_LTO_CLANG
  FROMLIST: BACKPORT: arm64: disable RANDOMIZE_MODULE_REGION_FULL with LTO_CLANG
  FROMLIST: arch/arm64/crypto: disable LTO for aes-ce-cipher.c
  arm64: disable ARM64_ERRATUM_843419 for clang LTO
  arm64: pass code model to LLVMgold
  FROMLIST: BACKPORT: arm64: make mrs_s and msr_s macros work with LTO
  FROMLIST: arm64: kvm: use -fno-jump-tables with clang
  FROMLIST: efi/libstub: disable LTO
  FROMLIST: scripts/mod: disable LTO for empty.c
  FROMLIST: BACKPORT: kbuild: fix dynamic ftrace with clang LTO
  FROMLIST: BACKPORT: kbuild: add support for clang LTO
  FROMLIST: BACKPORT: arm64: add a workaround for GNU gold with ARM64_MODULE_PLTS
  FROMLIST: arm64: explicitly pass --no-fix-cortex-a53-843419 to GNU gold
  FROMLIST: kbuild: add __ld-ifversion and linker-specific macros
  FROMLIST: kbuild: add ld-name macro
  FROMLIST: BACKPORT: arm64: keep .altinstructions and .altinstr_replacement
  arm64: fix LD_DEAD_CODE_DATA_ELIMINATION
  FROMLIST: kbuild: fix LD_DEAD_CODE_DATA_ELIMINATION
  FROMLIST: BACKPORT: kbuild: add __cc-ifversion and compiler-specific variants
  FROMLIST: kbuild: add clang-version.sh
  Revert "binder: add missing binder_unlock()"
  Linux 4.9.85
  x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface
  mm: fail get_vaddr_frames() for filesystem-dax mappings
  mm: Fix devm_memremap_pages() collision handling
  libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
  IB/core: disable memory registration of filesystem-dax vmas
  v4l2: disable filesystem-dax mapping support
  mm: introduce get_user_pages_longterm
  device-dax: implement ->split() to catch invalid munmap attempts
  libnvdimm: fix integer overflow static analysis warning
  fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
  mm: avoid spurious 'bad pmd' warning messages
  X.509: fix NULL dereference when restricting key with unsupported_sig
  binder: add missing binder_unlock()
  drm/amdgpu: add new device to use atpx quirk
  drm/amdgpu: Avoid leaking PM domain on driver unbind (v2)
  drm/amdgpu: add atpx quirk handling (v2)
  drm/amdgpu: Add dpm quirk for Jet PRO (v2)
  usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path
  usb: gadget: f_fs: Process all descriptors during bind
  Revert "usb: musb: host: don't start next rx urb if current one failed"
  usb: ldusb: add PIDs for new CASSY devices supported by this driver
  usb: dwc3: gadget: Set maxpacket size for ep0 IN
  drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
  Add delay-init quirk for Corsair K70 RGB keyboards
  arm64: Disable unhandled signal log messages by default
  usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()
  ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()
  PCI/cxgb4: Extend T3 PCI quirk to T4+ devices
  irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()
  x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()
  iio: adis_lib: Initialize trigger before requesting interrupt
  iio: buffer: check if a buffer has been set up when poll is called
  RDMA/uverbs: Protect from command mask overflow
  PKCS#7: fix certificate chain verification
  X.509: fix BUG_ON() when hash algorithm is unsupported
  cfg80211: fix cfg80211_beacon_dup
  scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
  xtensa: fix high memory/reserved memory collision
  netfilter: drop outermost socket lock in getsockopt()
  ANDROID: sdcardfs: Set num in extension_details during make_item

Conflicts:
	Makefile
	arch/arm64/include/asm/arch_gicv3.h
	arch/arm64/kernel/module.lds
	drivers/usb/gadget/function/f_fs.c
	scripts/link-vmlinux.sh

Change in module_param_call() definition requires alignment in:

	drivers/hwtracing/coresight/coresight-event.c
	drivers/media/radio/radio-iris-transport.c
	drivers/power/reset/msm-poweroff.c
	drivers/soc/qcom/wcnss/wcnss_wlan.c
	drivers/video/fbdev/msm/mdss_dsi_status.c

Change-Id: I2fa32c39bd4ba8a132f8f8abc8132a2ceb32907a
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2018-04-17 10:33:48 -07:00
Greg Kroah-Hartman
7118def012 Merge 4.9.85 into android-4.9
Changes in 4.9.85
	netfilter: drop outermost socket lock in getsockopt()
	xtensa: fix high memory/reserved memory collision
	scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
	cfg80211: fix cfg80211_beacon_dup
	X.509: fix BUG_ON() when hash algorithm is unsupported
	PKCS#7: fix certificate chain verification
	RDMA/uverbs: Protect from command mask overflow
	iio: buffer: check if a buffer has been set up when poll is called
	iio: adis_lib: Initialize trigger before requesting interrupt
	x86/oprofile: Fix bogus GCC-8 warning in nmi_setup()
	irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()
	PCI/cxgb4: Extend T3 PCI quirk to T4+ devices
	ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func()
	usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks()
	arm64: Disable unhandled signal log messages by default
	Add delay-init quirk for Corsair K70 RGB keyboards
	drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
	usb: dwc3: gadget: Set maxpacket size for ep0 IN
	usb: ldusb: add PIDs for new CASSY devices supported by this driver
	Revert "usb: musb: host: don't start next rx urb if current one failed"
	usb: gadget: f_fs: Process all descriptors during bind
	usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path
	drm/amdgpu: Add dpm quirk for Jet PRO (v2)
	drm/amdgpu: add atpx quirk handling (v2)
	drm/amdgpu: Avoid leaking PM domain on driver unbind (v2)
	drm/amdgpu: add new device to use atpx quirk
	binder: add missing binder_unlock()
	X.509: fix NULL dereference when restricting key with unsupported_sig
	mm: avoid spurious 'bad pmd' warning messages
	fs/dax.c: fix inefficiency in dax_writeback_mapping_range()
	libnvdimm: fix integer overflow static analysis warning
	device-dax: implement ->split() to catch invalid munmap attempts
	mm: introduce get_user_pages_longterm
	v4l2: disable filesystem-dax mapping support
	IB/core: disable memory registration of filesystem-dax vmas
	libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
	mm: Fix devm_memremap_pages() collision handling
	mm: fail get_vaddr_frames() for filesystem-dax mappings
	x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface
	Linux 4.9.85

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-02-28 16:31:38 +01:00
Dan Williams
b29ea3c0af mm: introduce get_user_pages_longterm
commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream.

Patch series "introduce get_user_pages_longterm()", v2.

Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely.  This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).

In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future.  This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.

Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.

Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.

I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel.  The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.

It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.

This patch (of 4):

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas.  Device-dax vmas do not have this problem and are
explicitly allowed.

This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).

[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-28 10:18:33 +01:00
Rik van Riel
124627553b add extra free kbytes tunable
Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.

[ccross]
Revived for use on old kernels where no other solution exists.
The tunable will be removed on kernels that do better at avoiding
direct reclaim.

Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Colin Cross <ccross@android.com>
Git-commit: f8ade3666c8612c33929f499b3c8800d83f43d84
Git-repo: https://android.googlesource.com/kernel/common.git
[vinmenon@codeaurora.org: avoid user setting of watermark_scale_factor
to any value other than 0, make the default watermark_scale_factor 0,
changes because of the presence of watermark_scale_factor. Move the
declaration of extra_free_kbytes to include file and remove the
initialization to zero in the definition. This fixes the checkpatch
warnings. Make the inclusion of variable "one_thousand" in kernel/sysctl.c
dependent on CONFIG_PERF_EVENTS which is the only user left now (earlier
being watermark_scale_factor]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2018-02-05 16:29:56 +05:30
Kyle Yan
ff97938fbf Merge remote-tracking branch '4.9/tmp-8dd0f52' into msm-4.9
* 4.9/tmp-8dd0f52:
  Linux 4.9.72
  sparc32: Export vac_cache_size to fix build error
  bpf: fix incorrect sign extension in check_alu_op()
  bpf: reject out-of-bounds stack pointer calculation
  bpf: fix branch pruning logic
  bpf: adjust insn_aux_data when patching insns
  Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
  platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes
  MIPS: math-emu: Fix final emulation phase for certain instructions
  thermal/drivers/hisi: Fix multiple alarm interrupts firing
  thermal/drivers/hisi: Simplify the temperature/step computation
  thermal/drivers/hisi: Fix kernel panic on alarm interrupt
  thermal/drivers/hisi: Fix missing interrupt enablement
  thermal: hisilicon: Handle return value of clk_prepare_enable
  cpuidle: fix broadcast control when broadcast can not be entered
  rtc: set the alarm to the next expiring timer
  tcp: fix under-evaluated ssthresh in TCP Vegas
  clk: sunxi-ng: sun6i: Rename HDMI DDC clock to avoid name collision
  staging: greybus: light: Release memory obtained by kasprintf
  net: ipv6: send NS for DAD when link operationally up
  fm10k: ensure we process SM mbx when processing VF mbx
  vfio/pci: Virtualize Maximum Payload Size
  scsi: lpfc: PLOGI failures during NPIV testing
  scsi: lpfc: Fix secure firmware updates
  fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
  ASoC: img-parallel-out: Add pm_runtime_get/put to set_fmt callback
  tracing: Exclude 'generic fields' from histograms
  PCI/AER: Report non-fatal errors only to the affected endpoint
  IB/rxe: check for allocation failure on elem
  ixgbe: fix use of uninitialized padding
  igb: check memory allocation failure
  PM / OPP: Move error message to debug level
  PCI: Create SR-IOV virtfn/physfn links before attaching driver
  scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
  scsi: cxgb4i: fix Tx skb leak
  PCI: Avoid bus reset if bridge itself is broken
  net: phy: at803x: Change error to EINVAL for invalid MAC
  kvm, mm: account kvm related kmem slabs to kmemcg
  rtc: pl031: make interrupt optional
  crypto: crypto4xx - increase context and scatter ring buffer elements
  backlight: pwm_bl: Fix overflow condition
  bnxt_en: Fix NULL pointer dereference in reopen failure path
  cpuidle: powernv: Pass correct drv->cpumask for registration
  ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
  Btrfs: fix an integer overflow check
  netfilter: nfnetlink_queue: fix secctx memory leak
  xhci: plat: Register shutdown for xhci_plat
  net: moxa: fix TX overrun memory leak
  isdn: kcapi: avoid uninitialized data
  virtio_balloon: prevent uninitialized variable use
  virtio-balloon: use actual number of stats for stats queue buffers
  KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
  net: ipconfig: fix ic_close_devs() use-after-free
  cpufreq: Fix creation of symbolic links to policy directories
  ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
  netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
  netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
  irda: vlsi_ir: fix check for DMA mapping errors
  RDMA/iser: Fix possible mr leak on device removal event
  i40e: Do not enable NAPI on q_vectors that have no rings
  IB/rxe: increment msn only when completing a request
  IB/rxe: double free on error
  net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
  nbd: set queue timeout properly
  infiniband: Fix alignment of mmap cookies to support VIPT caching
  IB/core: Protect against self-requeue of a cq work item
  i40iw: Receive netdev events post INET_NOTIFIER state
  bna: avoid writing uninitialized data into hw registers
  s390/qeth: no ETH header for outbound AF_IUCV
  s390/qeth: size calculation outbound buffers
  r8152: prevent the driver from transmitting packets with carrier off
  ASoC: STI: Fix reader substream pointer set
  HID: xinmo: fix for out of range for THT 2P arcade controller.
  hwmon: (asus_atk0110) fix uninitialized data access
  ARM: dts: ti: fix PCI bus dtc warnings
  KVM: VMX: Fix enable VPID conditions
  KVM: x86: correct async page present tracepoint
  kvm: vmx: Flush TLB when the APIC-access address changes
  scsi: lpfc: Fix PT2PT PRLI reject
  pinctrl: st: add irq_request/release_resources callbacks
  inet: frag: release spinlock before calling icmp_send()
  tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe
  r8152: fix the rx early size of RTL8153
  iommu/exynos: Workaround FLPD cache flush issues for SYSMMU v5
  netfilter: nfnl_cthelper: Fix memory leak
  netfilter: nfnl_cthelper: fix runtime expectation policy updates
  usb: gadget: udc: remove pointer dereference after free
  usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
  hwmon: (max31790) Set correct PWM value
  net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
  sctp: out_qlen should be updated when pruning unsent queue
  bna: integer overflow bug in debugfs
  sch_dsmark: fix invalid skb_cow() usage
  vsock: cancel packets when failing to connect
  vhost-vsock: add pkt cancel capability
  vsock: track pkt owner vsock
  crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
  r8152: fix the list rx_done may be used without initialization
  cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
  nvme-loop: handle cpu unplug when re-establishing the controller
  arm: kprobes: Align stack to 8-bytes in test code
  arm: kprobes: Fix the return address of multiple kretprobes
  HID: corsair: Add driver Scimitar Pro RGB gaming mouse 1b1c:1b3e support to hid-corsair
  HID: corsair: support for K65-K70 Rapidfire and Scimitar Pro RGB
  kvm: fix usage of uninit spinlock in avic_vm_destroy()
  ALSA: hda - add support for docking station for HP 840 G3
  ALSA: hda - add support for docking station for HP 820 G2
  arm64: Initialise high_memory global variable earlier
  cxl: Check if vphb exists before iterating over AFU devices
  Linux 4.9.71
  ath9k: fix tx99 potential info leak
  icmp: don't fail on fragment reassembly time exceeded
  IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
  RDMA/cma: Avoid triggering undefined behavior
  macvlan: Only deliver one copy of the frame to the macvlan interface
  udf: Avoid overflow when session starts at large offset
  scsi: bfa: integer overflow in debugfs
  scsi: sd: change allow_restart to bool in sysfs interface
  scsi: sd: change manage_start_stop to bool in sysfs interface
  rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_disassoc_cmd
  rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_createbss_cmd
  vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
  IB/core: Fix calculation of maximum RoCE MTU
  scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
  raid5: Set R5_Expanded on parity devices as well as data.
  pinctrl: adi2: Fix Kconfig build problem
  usb: musb: da8xx: fix babble condition handling
  tty fix oops when rmmod 8250
  soc: mediatek: pwrap: fix compiler errors
  powerpc/perf/hv-24x7: Fix incorrect comparison in memord
  scsi: hpsa: destroy sas transport properties before scsi_host
  scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
  PCI: Detach driver before procfs & sysfs teardown on device remove
  RDMA/cxgb4: Declare stag as __be32
  xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
  xfs: fix log block underflow during recovery cycle verification
  l2tp: cleanup l2tp_tunnel_delete calls
  nvme: use kref_get_unless_zero in nvme_find_get_ns
  platform/x86: hp_accel: Add quirk for HP ProBook 440 G4
  btrfs: tests: Fix a memory leak in error handling path in 'run_test()'
  arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27
  Ib/hfi1: Return actual operational VLs in port info query
  bcache: fix wrong cache_misses statistics
  bcache: explicitly destroy mutex while exiting
  GFS2: Take inode off order_write list when setting jdata flag
  scsi: scsi_debug: write_same: fix error report
  thermal/drivers/step_wise: Fix temperature regulation misbehavior
  ASoC: rsnd: rsnd_ssi_run_mods() needs to care ssi_parent_mod
  ppp: Destroy the mutex when cleanup
  clk: tegra: Fix cclk_lp divisor register
  clk: hi6220: mark clock cs_atb_syspll as critical
  clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
  clk: mediatek: add the option for determining PLL source clock
  mm: Handle 0 flags in _calc_vm_trans() macro
  crypto: tcrypt - fix buffer lengths in test_aead_speed()
  arm-ccn: perf: Prevent module unload while PMU is in use
  xfs: truncate pagecache before writeback in xfs_setattr_size()
  iommu/amd: Limit the IOVA page range to the specified addresses
  badblocks: fix wrong return value in badblocks_set if badblocks are disabled
  target/file: Do not return error for UNMAP if length is zero
  target:fix condition return in core_pr_dump_initiator_port()
  iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
  target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
  platform/x86: intel_punit_ipc: Fix resource ioremap warning
  powerpc/ipic: Fix status get and status clear
  powerpc/opal: Fix EBUSY bug in acquiring tokens
  netfilter: ipvs: Fix inappropriate output of procfs
  iommu/mediatek: Fix driver name
  PCI: Do not allocate more buses than available in parent
  powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
  PCI/PME: Handle invalid data when reading Root Status
  dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
  ASoC: Intel: Skylake: Fix uuid_module memory leak in failure case
  rtc: pcf8563: fix output clock rate
  video: fbdev: au1200fb: Return an error code if a memory allocation fails
  video: fbdev: au1200fb: Release some resources if a memory allocation fails
  video: udlfb: Fix read EDID timeout
  fbdev: controlfb: Add missing modes to fix out of bounds access
  sfc: don't warn on successful change of MAC
  HID: cp2112: fix broken gpio_direction_input callback
  Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
  target: fix race during implicit transition work flushes
  target: fix ALUA transition timeout handling
  target: Use system workqueue for ALUA transitions
  btrfs: add missing memset while reading compressed inline extents
  NFSv4.1 respect server's max size in CREATE_SESSION
  efi/esrt: Cleanup bad memory map log messages
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  tty: fix data race in tty_ldisc_ref_wait()
  tty: don't panic on OOM in tty_set_ldisc()
  rxrpc: Ignore BUSY packets on old calls
  net: mpls: Fix nexthop alive tracking on down events
  net/mlx4_core: Avoid delays during VF driver device shutdown
  nvmet-rdma: Fix a possible uninitialized variable dereference
  nvmet: confirm sq percpu has scheduled and switched to atomic
  nvme-loop: fix a possible use-after-free when destroying the admin queue
  afs: Fix abort on signal while waiting for call completion
  afs: Fix afs_kill_pages()
  afs: Fix page leak in afs_write_begin()
  afs: Populate and use client modification time
  afs: Better abort and net error handling
  afs: Invalid op ID should abort with RXGEN_OPCODE
  afs: Fix the maths in afs_fs_store_data()
  afs: Prevent callback expiry timer overflow
  afs: Migrate vlocation fields to 64-bit
  afs: Flush outstanding writes when an fd is closed
  afs: Deal with an empty callback array
  afs: Adjust mode bits processing
  afs: Populate group ID from vnode status
  afs: Fix missing put_page()
  drm/radeon: reinstate oland workaround for sclk
  mmc: mediatek: Fixed bug where clock frequency could be set wrong
  sched/deadline: Use deadline instead of period when calculating overflow
  sched/deadline: Throttle a constrained deadline task activated after the deadline
  sched/deadline: Make sure the replenishment timer fires in the next period
  sched/deadline: Add missing update_rq_clock() in dl_task_timer()
  iwlwifi: mvm: cleanup pending frames in DQA mode
  Drivers: hv: util: move waiting for release to hv_utils_transport itself
  drm/radeon/si: add dpm quirk for Oland
  fjes: Fix wrong netdevice feature flags
  scsi: hpsa: do not timeout reset operations
  scsi: hpsa: limit outstanding rescans
  scsi: hpsa: update check for logical volume status
  ASoC: rcar: clear DE bit only in PDMACHCR when it stops
  openrisc: fix issue handling 8 byte get_user calls
  intel_th: pci: Add Gemini Lake support
  drm: amd: remove broken include path
  qed: Fix interrupt flags on Rx LL2
  qed: Fix mapping leak on LL2 rx flow
  qed: Align CIDs according to DORQ requirement
  mlxsw: reg: Fix SPVMLR max record count
  mlxsw: reg: Fix SPVM max record count
  net: Resend IGMP memberships upon peer notification.
  irqchip/mvebu-odmi: Select GENERIC_MSI_IRQ_DOMAIN
  dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
  net: wimax/i2400m: fix NULL-deref at probe
  writeback: fix memory leak in wb_queue_work()
  blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
  ASoC: rsnd: fix sound route path when using SRC6/SRC9
  netfilter: bridge: honor frag_max_size when refragmenting
  drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
  Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
  NFSD: fix nfsd_reset_versions for NFSv4.
  NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
  drm/amdgpu: fix parser init error path to avoid crash in parser fini
  iommu/io-pgtable-arm-v7s: Check for leaf entry before dereferencing it
  net/mlx5: Don't save PCI state when PCI error is detected
  net/mlx5: Fix create autogroup prev initializer
  rxrpc: Wake up the transmitter if Rx window size increases on the peer
  net: bcmgenet: Power up the internal PHY before probing the MII
  net: bcmgenet: synchronize irq0 status between the isr and task
  net: bcmgenet: power down internal phy if open or resume fails
  net: bcmgenet: reserved phy revisions must be checked first
  net: bcmgenet: correct MIB access of UniMAC RUNT counters
  net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
  bnxt_en: Ignore 0 value in autoneg supported speed from firmware.
  net: initialize msg.msg_flags in recvfrom
  userfaultfd: selftest: vm: allow to build in vm/ directory
  userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
  md-cluster: free md_cluster_info if node leave cluster
  usb: xhci-mtk: check hcc_params after adding primary hcd
  KVM: nVMX: do not warn when MSR bitmap address is not backed
  usb: phy: isp1301: Add OF device ID table
  mac80211: Fix addition of mesh configuration element
  ext4: fix crash when a directory's i_size is too small
  ext4: fix fdatasync(2) after fallocate(2) operation
  dmaengine: dmatest: move callback wait queue to thread context
  eeprom: at24: change nvmem stride to 1
  sched/rt: Do not pull from current CPU if only one CPU to pull
  nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
  xhci: Don't add a virt_dev to the devs array before it's fully allocated
  Bluetooth: btusb: driver to enable the usb-wakeup feature
  usb: xhci: fix TDS for MTK xHCI1.1
  ceph: drop negative child dentries before try pruning inode's alias
  usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
  usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
  usb: add helper to extract bits 12:11 of wMaxPacketSize
  usbip: fix stub_rx: get_pipe() to validate endpoint number
  USB: core: prevent malicious bNumInterfaces overflow
  USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
  tracing: Allocate mask_str buffer dynamically
  autofs: fix careless error in recent commit
  crypto: salsa20 - fix blkcipher_walk API usage
  crypto: hmac - require that the underlying hash algorithm is unkeyed
  crypto: rsa - fix buffer overread when stripping leading zeroes
  mfd: fsl-imx25: Clean up irq settings during removal
  Linux 4.9.70
  RDMA/cxgb4: Annotate r2 and stag as __be32
  md: free unused memory after bitmap resize
  audit: ensure that 'audit=1' actually enables audit for PID 1
  ipvlan: fix ipv6 outbound device
  kbuild: do not call cc-option before KBUILD_CFLAGS initialization
  powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold
  KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table
  fix kcm_clone()
  usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
  s390: always save and restore all registers on context switch
  ipmi: Stop timers before cleaning up the module
  Fix handling of verdicts after NF_QUEUE
  tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()
  s390/qeth: fix thinko in IPv4 multicast address tracking
  s390/qeth: fix GSO throughput regression
  s390/qeth: build max size GSO skbs on L2 devices
  tcp/dccp: block bh before arming time_wait timer
  stmmac: reset last TSO segment size after device open
  net: remove hlist_nulls_add_tail_rcu()
  usbnet: fix alignment for frames with no ethernet header
  net/packet: fix a race in packet_bind() and packet_notifier()
  packet: fix crash in fanout_demux_rollover()
  sit: update frag_off info
  rds: Fix NULL pointer dereference in __rds_rdma_map
  tipc: fix memory leak in tipc_accept_from_sock()
  s390/qeth: fix early exit from error path
  net: qmi_wwan: add Quectel BG96 2c7c:0296
  ANDROID: dma-buf/sw_sync: Rename active_list to link
  FROMLIST: android: binder: Fix null ptr dereference in debug msg
  FROMLIST: android: binder: Move buffer out of area shared with user space
  FROMLIST: android: binder: Add allocator selftest
  FROMLIST: android: binder: Refactor prev and next buffer into a helper function
  Linux 4.9.69
  afs: Connect up the CB.ProbeUuid
  IB/mlx5: Assign send CQ and recv CQ of UMR QP
  IB/mlx4: Increase maximal message size under UD QP
  xfrm: Copy policy family in clone_policy
  jump_label: Invoke jump_label_test() via early_initcall()
  atm: horizon: Fix irq release error
  clk: uniphier: fix DAPLL2 clock rate of Pro5
  bpf: fix lockdep splat
  sctp: use the right sk after waking up from wait_buf sleep
  sctp: do not free asoc when it is already dead in sctp_sendmsg
  zsmalloc: calling zs_map_object() from irq is a bug
  sparc64/mm: set fields in deferred pages
  block: wake up all tasks blocked in get_request()
  dt-bindings: usb: fix reg-property port-number range
  xfs: fix forgotten rcu read unlock when skipping inode reclaim
  sunrpc: Fix rpc_task_begin trace point
  NFS: Fix a typo in nfs_rename()
  dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
  lib/genalloc.c: make the avail variable an atomic_long_t
  drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()'
  route: update fnhe_expires for redirect when the fnhe exists
  route: also update fnhe_genid when updating a route cache
  gre6: use log_ecn_error module parameter in ip6_tnl_rcv()
  mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
  x86/mpx/selftests: Fix up weird arrays
  coccinelle: fix parallel build with CHECK=scripts/coccicheck
  kbuild: pkg: use --transform option to prefix paths in tar
  EDAC, i5000, i5400: Fix definition of NRECMEMB register
  EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
  powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
  drm/amd/amdgpu: fix console deadlock if late init failed
  axonram: Fix gendisk handling
  netfilter: don't track fragmented packets
  zram: set physical queue limits to avoid array out of bounds accesses
  blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
  i2c: riic: fix restart condition
  crypto: s5p-sss - Fix completing crypto request in IRQ handler
  ipv6: reorder icmpv6_init() and ip6_mr_init()
  ibmvnic: Allocate number of rx/tx buffers agreed on by firmware
  ibmvnic: Fix overflowing firmware/hardware TX queue
  rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races
  bnx2x: do not rollback VF MAC/VLAN filters we did not configure
  bnx2x: fix detection of VLAN filtering feature for VF
  bnx2x: fix possible overrun of VFPF multicast addresses array
  bnx2x: prevent crash when accessing PTP with interface down
  spi_ks8995: regs_size incorrect for some devices
  spi_ks8995: fix "BUG: key accdaa28 not in .data!"
  KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
  arm64: KVM: Survive unknown traps from guests
  arm: KVM: Survive unknown traps from guests
  KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
  irqchip/crossbar: Fix incorrect type of register size
  scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
  scsi: qla2xxx: Fix ql_dump_buffer
  workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
  libata: drop WARN from protocol error in ata_sff_qc_issue()
  kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
  usb: gadget: udc: net2280: Fix tmp reusage in net2280 driver
  usb: gadget: pxa27x: Test for a valid argument pointer
  usb: dwc3: gadget: Fix system suspend/resume on TI platforms
  USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
  usb: gadget: configs: plug memory leak
  HID: chicony: Add support for another ASUS Zen AiO keyboard
  gpio: altera: Use handle_level_irq when configured as a level_high
  ASoC: rcar: avoid SSI_MODEx settings for SSI8
  ARM: OMAP2+: Release device node after it is no longer needed.
  ARM: OMAP2+: Fix device node reference counts
  powerpc/64: Fix checksum folding in csum_add()
  module: set __jump_table alignment to 8
  lirc: fix dead lock between open and wakeup_filter
  powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
  selftest/powerpc: Fix false failures for skipped tests
  powerpc/64: Invalidate process table caching after setting process table
  x86/hpet: Prevent might sleep splat on resume
  sched/fair: Make select_idle_cpu() more aggressive
  x86/platform/uv/BAU: Fix HUB errors by remove initial write to sw-ack register
  x86/selftests: Add clobbers for int80 on x86_64
  ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
  vti6: Don't report path MTU below IPV6_MIN_MTU.
  ARM: 8657/1: uaccess: consistently check object sizes
  Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
  Revert "drm/armada: Fix compile fail"
  mm: drop unused pmdp_huge_get_and_clear_notify()
  thp: fix MADV_DONTNEED vs. numa balancing race
  thp: reduce indentation level in change_huge_pmd()
  ARM: avoid faulting on qemu
  ARM: BUG if jumping to usermode address in kernel mode
  usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
  crypto: talitos - fix ctr-aes-talitos
  crypto: talitos - fix use of sg_link_tbl_len
  crypto: talitos - fix AEAD for sha224 on non sha224 capable chips
  crypto: talitos - fix setkey to check key weakness
  crypto: talitos - fix memory corruption on SEC2
  crypto: talitos - fix AEAD test failures
  bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left.
  bus: arm-ccn: Fix use of smp_processor_id() in preemptible context
  bus: arm-ccn: Check memory allocation failure
  bus: arm-cci: Fix use of smp_processor_id() in preemptible context
  arm64: fpsimd: Prevent registers leaking from dead tasks
  KVM: arm/arm64: vgic-its: Check result of allocation before use
  KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation
  KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
  KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
  arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
  arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
  media: dvb: i2c transfers over usb cannot be done from stack
  drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
  kdb: Fix handling of kallsyms_symbol_next() return value
  brcmfmac: change driver unbind order of the sdio function devices
  powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
  KVM: s390: Fix skey emulation permission check
  s390: fix compat system call table
  smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place
  iommu/vt-d: Fix scatterlist offset handling
  ALSA: usb-audio: Add check return value for usb_string()
  ALSA: usb-audio: Fix out-of-bound error
  ALSA: seq: Remove spurious WARN_ON() at timer check
  ALSA: pcm: prevent UAF in snd_pcm_info
  btrfs: fix missing error return in btrfs_drop_snapshot
  KVM: x86: fix APIC page invalidation
  x86/PCI: Make broadcom_postcore_init() check acpi_disabled
  X.509: fix comparisons of ->pkey_algo
  X.509: reject invalid BIT STRING for subjectPublicKey
  KEYS: add missing permission check for request_key() destination
  ASN.1: check for error from ASN1_OP_END__ACT actions
  ASN.1: fix out-of-bounds read when parsing indefinite length item
  efi/esrt: Use memunmap() instead of kfree() to free the remapping
  efi: Move some sysfs files to be read-only by root
  scsi: libsas: align sata_device's rps_resp on a cacheline
  scsi: use dma_get_cache_alignment() as minimum DMA alignment
  scsi: dma-mapping: always provide dma_get_cache_alignment
  isa: Prevent NULL dereference in isa_bus driver callbacks
  hv: kvp: Avoid reading past allocated blocks from KVP file
  virtio: release virtio index when fail to device_register
  can: usb_8dev: cancel urb on -EPIPE and -EPROTO
  can: esd_usb2: cancel urb on -EPIPE and -EPROTO
  can: ems_usb: cancel urb on -EPIPE and -EPROTO
  can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
  can: kvaser_usb: ratelimit errors if incomplete messages are received
  can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
  can: kvaser_usb: free buf in error paths
  can: ti_hecc: Fix napi poll return value for repoll
  usb: gadget: udc: renesas_usb3: fix number of the pipes
  ANDROID: Revert "arm64: move ELF_ET_DYN_BASE to 4GB / 4MB"
  ANDROID: Revert "arm: move ELF_ET_DYN_BASE to 4MB"
  Linux 4.9.68
  xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
  usb: host: fix incorrect updating of offset
  USB: usbfs: Filter flags passed in from user space
  USB: devio: Prevent integer overflow in proc_do_submiturb()
  USB: Increase usbfs transfer limit
  USB: core: Add type-specific length check of BOS descriptors
  usb: xhci: fix panic in xhci_free_virt_devices_depth_first
  usb: hub: Cycle HUB power when initialization fails
  dma-buf: Update kerneldoc for sync_file_create
  dma-buf/sync_file: hold reference to fence when creating sync_file
  dma-buf/sw_sync: force signal all unsignaled fences on dying timeline
  dma-fence: Introduce drm_fence_set_error() helper
  dma-fence: Wrap querying the fence->status
  dma-fence: Clear fence->status during dma_fence_init()
  dma-buf/sw_sync: clean up list before signaling the fence
  dma-buf/sw_sync: move timeline_fence_ops around
  dma-buf/sw-sync: Use an rbtree to sort fences in the timeline
  dma-buf/sw-sync: Fix locking around sync_timeline lists
  dma-buf/sw-sync: sync_pt is private and of fixed size
  dma-buf/sw-sync: Reduce irqsave/irqrestore from known context
  dma-buf/sw-sync: Prevent user overflow on timeline advance
  dma-buf/sw-sync: Fix the is-signaled test to handle u32 wraparound
  dma-buf/dma-fence: Extract __dma_fence_is_later()
  net: fec: fix multicast filtering hardware setup
  xen-netback: vif counters from int/long to u64
  cec: initiator should be the same as the destination for, poll
  xen-netfront: Improve error handling during initialization
  mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
  vfio/spapr: Fix missing mutex unlock when creating a window
  be2net: fix initial MAC setting
  net: thunderx: avoid dereferencing xcv when NULL
  net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
  gtp: fix cross netns recv on gtp socket
  gtp: clear DF bit on GTP packet tx
  nvmet: cancel fatal error and flush async work before free controller
  i2c: i2c-cadence: Initialize configuration before probing devices
  tcp: correct memory barrier usage in tcp_check_space()
  dmaengine: pl330: fix double lock
  tipc: fix cleanup at module unload
  tipc: fix nametbl_lock soft lockup at module exit
  RDMA/qedr: Fix RDMA CM loopback
  RDMA/qedr: Return success when not changing QP state
  mac80211: don't try to sleep in rate_control_rate_init()
  drm/amdgpu: fix unload driver issue for virtual display
  x86/fpu: Set the xcomp_bv when we fake up a XSAVES area
  net: sctp: fix array overrun read on sctp_timer_tbl
  drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
  drm/amdgpu: fix bug set incorrect value to vce register
  qla2xxx: Fix wrong IOCB type assumption
  powerpc/mm: Fix memory hotplug BUG() on radix
  perf/x86/intel: Account interrupts for PEBS errors
  NFSv4: Fix client recovery when server reboots multiple times
  mac80211: prevent skb/txq mismatch
  KVM: arm/arm64: Fix occasional warning from the timer work function
  drm/exynos/decon5433: set STANDALONE_UPDATE_F also if planes are disabled
  drm/exynos/decon5433: update shadow registers iff there are active windows
  nfs: Don't take a reference on fl->fl_file for LOCK operation
  ravb: Remove Rx overflow log messages
  mac80211: calculate min channel width correctly
  mm: fix remote numa hits statistics
  net: qrtr: Mark 'buf' as little endian
  libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
  net/appletalk: Fix kernel memory disclosure
  be2net: fix unicast list filling
  be2net: fix accesses to unicast list
  vti6: fix device register to report IFLA_INFO_KIND
  ARM: OMAP1: DMA: Correct the number of logical channels
  ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
  net: systemport: Pad packet before inserting TSB
  net: systemport: Utilize skb_put_padto()
  libcxgb: fix error check for ip6_route_output()
  usb: gadget: f_fs: Fix ExtCompat descriptor validation
  dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
  dmaengine: stm32-dma: Set correct args number for DMA request from DT
  l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups
  net/mlx4_en: Fix type mismatch for 32-bit systems
  dax: Avoid page invalidation races and unnecessary radix tree traversals
  iio: adc: ti-ads1015: add 10% to conversion wait time
  tools include: Do not use poison with C++
  kprobes/x86: Disable preemption in ftrace-based jprobes
  perf test attr: Fix ignored test case result
  usbip: tools: Install all headers needed for libusbip development
  sysrq : fix Show Regs call trace on ARM
  EDAC, sb_edac: Fix missing break in switch
  x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
  serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
  usb: phy: tahvo: fix error handling in tahvo_usb_probe()
  mmc: sdhci-msm: fix issue with power irq
  spi: spi-axi: fix potential use-after-free after deregistration
  spi: sh-msiof: Fix DMA transfer size check
  staging: rtl8188eu: avoid a null dereference on pmlmepriv
  serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
  m68k: fix ColdFire node shift size calculation
  staging: greybus: loopback: Fix iteration count on async path
  selftests/x86/ldt_get: Add a few additional tests for limits
  s390/pci: do not require AIS facility
  ima: fix hash algorithm initialization
  USB: serial: option: add Quectel BG96 id
  s390/runtime instrumentation: simplify task exit handling
  serial: 8250_pci: Add Amazon PCI serial device ID
  usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
  uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
  mm, oom_reaper: gather each vma to prevent leaking TLB entry
  Revert "crypto: caam - get rid of tasklet"
  drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
  drm/fsl-dcu: avoid disabling pixel clock twice on suspend
  bcache: recover data from backing when data is clean
  bcache: only permit to recovery read error when cache device is clean
  Linux 4.9.67
  drm/i915: Prevent zero length "index" write
  drm/i915: Don't try indexed reads to alternate slave addresses
  NFS: revalidate "." etc correctly on "open".
  Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()"
  drm/amd/pp: fix typecast error in powerplay.
  drm/ttm: once more fix ttm_buffer_object_transfer
  drm/hisilicon: Ensure LDI regs are properly configured.
  drm/panel: simple: Add missing panel_simple_unprepare() calls
  drm/radeon: fix atombios on big endian
  drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
  drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
  Revert "drm/radeon: dont switch vt on suspend"
  nvme-pci: add quirk for delay before CHK RDY for WDC SN200
  hwmon: (jc42) optionally try to disable the SMBUS timeout
  bcache: Fix building error on MIPS
  i2c: i801: Fix Failed to allocate irq -2147483648 error
  eeprom: at24: check at24_read/write arguments
  eeprom: at24: correctly set the size for at24mac402
  eeprom: at24: fix reading from 24MAC402/24MAC602
  mmc: core: prepend 0x to OCR entry in sysfs
  mmc: core: Do not leave the block driver in a suspended state
  KVM: lapic: Fixup LDR on load in x2apic
  KVM: lapic: Split out x2apic ldr calculation
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: x86: Exit to user-mode on #UD intercept when emulator requires
  KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
  ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
  mfd: twl4030-power: Fix pmic for boards that need vmmc1 on reboot
  nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
  nfsd: Fix another OPEN stateid race
  nfsd: Fix stateid races between OPEN and CLOSE
  btrfs: clear space cache inode generation always
  mm/madvise.c: fix madvise() infinite loop under special circumstances
  mm, hugetlbfs: introduce ->split() to vm_operations_struct
  mm/cma: fix alloc_contig_range ret code/potential leak
  mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
  ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
  ARM: dts: LogicPD Torpedo: Fix camera pin mux
  Linux 4.9.66
  xen: xenbus driver must not accept invalid transaction ids
  nvmet: fix KATO offset in Set Features
  cec: update log_addr[] before finishing configuration
  cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
  cec: when canceling a message, don't overwrite old status info
  s390/kbuild: enable modversions for symbols exported from asm
  ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
  btrfs: return the actual error value from from btrfs_uuid_tree_iterate
  crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
  ASoC: rsnd: don't double free kctrl
  netfilter: nf_tables: fix oob access
  netfilter: nft_queue: use raw_smp_processor_id()
  spi: SPI_FSL_DSPI should depend on HAS_DMA
  staging: iio: cdc: fix improper return value
  iio: light: fix improper return value
  adm80211: add checks for dma mapping errors
  mac80211: Suppress NEW_PEER_CANDIDATE event if no room
  mac80211: Remove invalid flag operations in mesh TSF synchronization
  drm/mediatek: don't use drm_put_dev
  clk: qcom: ipq4019: Add all the frequencies for apss cpu
  drm: Apply range restriction after color adjustment when allocation
  gpio: mockup: dynamically allocate memory for chip name
  ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE
  ath10k: set CTS protection VDEV param only if VDEV is up
  bnxt_en: Set default completion ring for async events.
  pinctrl: sirf: atlas7: Add missing 'of_node_put()'
  ath10k: fix potential memory leak in ath10k_wmi_tlv_op_pull_fw_stats()
  ath10k: ignore configuring the incorrect board_id
  ath10k: fix incorrect txpower set by P2P_DEVICE interface
  mwifiex: sdio: fix use after free issue for save_adapter
  adm80211: return an error if adm8211_alloc_rings() fails
  rt2800: set minimum MPDU and PSDU lengths to sane values
  drm/armada: Fix compile fail
  net: 3com: typhoon: typhoon_init_one: fix incorrect return values
  net: 3com: typhoon: typhoon_init_one: make return values more specific
  net: Allow IP_MULTICAST_IF to set index to L3 slave
  fscrypt: use ENOTDIR when setting encryption policy on nondirectory
  fscrypt: use ENOKEY when file cannot be created w/o key
  dmaengine: zx: set DMA_CYCLIC cap_mask bit
  clk: sunxi-ng: fix PLL_CPUX adjusting on A33
  clk: sunxi-ng: A31: Fix spdif clock register
  drm/sun4i: Fix a return value in case of error
  PCI: Apply _HPX settings only to relevant devices
  RDS: RDMA: fix the ib_map_mr_sg_zbva() argument
  RDS: RDMA: return appropriate error on rdma map failures
  RDS: make message size limit compliant with spec
  e1000e: Avoid receiver overrun interrupt bursts
  e1000e: Separate signaling for link check/link up
  e1000e: Fix return value test
  e1000e: Fix error path in link detection
  Revert "drm/i915: Do not rely on wm preservation for ILK watermarks"
  PM / OPP: Add missing of_node_put(np)
  net/9p: Switch to wait_event_killable()
  fscrypt: lock mutex before checking for bounce page pool
  sched/rt: Simplify the IPI based RT balancing logic
  media: v4l2-ctrl: Fix flags field on Control events
  cx231xx-cards: fix NULL-deref on missing association descriptor
  media: rc: check for integer overflow
  media: Don't do DMA on stack for firmware upload in the AS102 driver
  powerpc/signal: Properly handle return value from uprobe_deny_signal()
  parisc: Fix validity check of pointer size argument in new CAS implementation
  ixgbe: Fix skb list corruption on Power systems
  fm10k: Use smp_rmb rather than read_barrier_depends
  i40evf: Use smp_rmb rather than read_barrier_depends
  ixgbevf: Use smp_rmb rather than read_barrier_depends
  igbvf: Use smp_rmb rather than read_barrier_depends
  igb: Use smp_rmb rather than read_barrier_depends
  i40e: Use smp_rmb rather than read_barrier_depends
  NFC: fix device-allocation error return
  IB/srp: Avoid that a cable pull can trigger a kernel crash
  IB/srpt: Do not accept invalid initiator port names
  libnvdimm, namespace: make 'resource' attribute only readable by root
  libnvdimm, namespace: fix label initialization to use valid seq numbers
  libnvdimm, pfn: make 'resource' attribute only readable by root
  clk: ti: dra7-atl-clock: fix child-node lookups
  SUNRPC: Fix tracepoint storage issues with svc_recv and svc_rqst_status
  KVM: SVM: obey guest PAT
  KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
  lockd: double unregister of inetaddr notifiers
  irqchip/gic-v3: Fix ppi-partitions lookup
  block: Fix a race between blk_cleanup_queue() and timeout handling
  p54: don't unregister leds when they are not initialized
  mtd: nand: mtk: fix infinite ECC decode IRQ issue
  mtd: nand: Fix writing mtdoops to nand flash.
  mtd: nand: omap2: Fix subpage write
  target: Fix QUEUE_FULL + SCSI task attribute handling
  iscsi-target: Fix non-immediate TMR reference leak
  fs/9p: Compare qid.path in v9fs_test_inode
  fix a page leak in vhost_scsi_iov_to_sgl() error recovery
  ALSA: hda/realtek - Fix ALC700 family no sound issue
  ALSA: hda: Fix too short HDMI/DP chmap reporting
  ALSA: timer: Remove kernel warning at compat ioctl error paths
  ALSA: usb-audio: Add sanity checks in v2 clock parsers
  ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
  ALSA: usb-audio: Add sanity checks to FE parser
  ALSA: pcm: update tstamp only if audio_tstamp changed
  ext4: fix interaction between i_size, fallocate, and delalloc after a crash
  ata: fixes kernel crash while tracing ata_eh_link_autopsy event
  rtlwifi: fix uninitialized rtlhal->last_suspend_sec time
  rtlwifi: rtl8192ee: Fix memory leak when loading firmware
  nfsd: deal with revoked delegations appropriately
  NFS: Avoid RCU usage in tracepoints
  nfs: Fix ugly referral attributes
  NFS: Fix typo in nomigration mount option
  isofs: fix timestamps beyond 2027
  bcache: check ca->alloc_thread initialized before wake up it
  libceph: don't WARN() if user tries to add invalid key
  eCryptfs: use after free in ecryptfs_release_messaging()
  nilfs2: fix race condition that causes file system corruption
  autofs: don't fail mount for transient error
  rt2x00usb: mark device removed when get ENOENT usb error
  MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
  MIPS: Fix an n32 core file generation regset support regression
  MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
  MIPS: Fix odd fp register warnings with MIPS64r2
  dm: fix race between dm_get_from_kobject() and __dm_destroy()
  MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
  dm: allocate struct mapped_device with kvzalloc
  dm bufio: fix integer overflow when limiting maximum cache size
  ALSA: hda: Add Raven PCI ID
  PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
  MIPS: ralink: Fix typo in mt7628 pinmux function
  MIPS: ralink: Fix MT7628 pinmux
  ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
  ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
  arm64: Implement arch-specific pte_access_permitted()
  x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
  x86/decoder: Add new TEST instruction pattern
  lib/mpi: call cond_resched() from mpi_powm() loop
  sched: Make resched_cpu() unconditional
  vsock: use new wait API for vsock_stream_sendmsg()
  ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
  x86/mm: fix use-after-free of vma during userfaultfd fault
  ACPI / EC: Fix regression related to triggering source of EC event handling
  s390/disassembler: increase show_code buffer size
  s390/disassembler: add missing end marker for e7 table
  s390/runtime instrumention: fix possible memory corruption
  s390: fix transactional execution control register handling

Conflicts:
	drivers/android/binder_alloc.c
	drivers/android/binder_alloc.h
	drivers/android/binder_alloc_selftest.c
	drivers/mmc/core/bus.c
	drivers/mmc/host/sdhci-msm.c
	drivers/thermal/step_wise.c
	kernel/cpu.c
	mm/oom_kill.c
	sound/usb/mixer.c

Change-Id: Id01eb66cafc5970b460321e44ec8ffcfa76971a6
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2018-01-02 10:37:28 -08:00
Greg Kroah-Hartman
f87ffb2f71 Merge 4.9.67 into android-4.9-o
Changes in 4.9.67
	ARM: dts: LogicPD Torpedo: Fix camera pin mux
	ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/cma: fix alloc_contig_range ret code/potential leak
	mm, hugetlbfs: introduce ->split() to vm_operations_struct
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	btrfs: clear space cache inode generation always
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
	mfd: twl4030-power: Fix pmic for boards that need vmmc1 on reboot
	ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	KVM: lapic: Split out x2apic ldr calculation
	KVM: lapic: Fixup LDR on load in x2apic
	mmc: core: Do not leave the block driver in a suspended state
	mmc: core: prepend 0x to OCR entry in sysfs
	eeprom: at24: fix reading from 24MAC402/24MAC602
	eeprom: at24: correctly set the size for at24mac402
	eeprom: at24: check at24_read/write arguments
	i2c: i801: Fix Failed to allocate irq -2147483648 error
	bcache: Fix building error on MIPS
	hwmon: (jc42) optionally try to disable the SMBUS timeout
	nvme-pci: add quirk for delay before CHK RDY for WDC SN200
	Revert "drm/radeon: dont switch vt on suspend"
	drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
	drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	drm/hisilicon: Ensure LDI regs are properly configured.
	drm/ttm: once more fix ttm_buffer_object_transfer
	drm/amd/pp: fix typecast error in powerplay.
	Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()"
	NFS: revalidate "." etc correctly on "open".
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	Linux 4.9.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-05 11:53:12 +01:00
Greg Kroah-Hartman
12cae95a09 Merge 4.9.67 into android-4.9
Changes in 4.9.67
	ARM: dts: LogicPD Torpedo: Fix camera pin mux
	ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/cma: fix alloc_contig_range ret code/potential leak
	mm, hugetlbfs: introduce ->split() to vm_operations_struct
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	btrfs: clear space cache inode generation always
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat
	mfd: twl4030-power: Fix pmic for boards that need vmmc1 on reboot
	ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	KVM: lapic: Split out x2apic ldr calculation
	KVM: lapic: Fixup LDR on load in x2apic
	mmc: core: Do not leave the block driver in a suspended state
	mmc: core: prepend 0x to OCR entry in sysfs
	eeprom: at24: fix reading from 24MAC402/24MAC602
	eeprom: at24: correctly set the size for at24mac402
	eeprom: at24: check at24_read/write arguments
	i2c: i801: Fix Failed to allocate irq -2147483648 error
	bcache: Fix building error on MIPS
	hwmon: (jc42) optionally try to disable the SMBUS timeout
	nvme-pci: add quirk for delay before CHK RDY for WDC SN200
	Revert "drm/radeon: dont switch vt on suspend"
	drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
	drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	drm/hisilicon: Ensure LDI regs are properly configured.
	drm/ttm: once more fix ttm_buffer_object_transfer
	drm/amd/pp: fix typecast error in powerplay.
	Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()"
	NFS: revalidate "." etc correctly on "open".
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	Linux 4.9.67

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-05 11:34:59 +01:00
Dan Williams
cebe139e57 mm, hugetlbfs: introduce ->split() to vm_operations_struct
commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream.

Patch series "device-dax: fix unaligned munmap handling"

When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges.  It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint.  Instead, these patches introduce a new ->split() vm
operation.

This patch (of 2):

The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units.  Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.

Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: dee4107924 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:24:32 +01:00
Kirill A. Shutemov
f51ed739c2 mm: drop zap_details::check_swap_entries
detail == NULL would give the same functionality as
.check_swap_entries==true.

Change-Id: Iddb5499d069ac76126812c0296389409215cda9a
Link: http://lkml.kernel.org/r/20170118122429.43661-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: 3e8715fdc03e8df4d26d8e436166e44e3e416d3b
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2017-11-21 03:22:42 -08:00
Laura Abbott
845fbd8e99 UPSTREAM: mm: Introduce lm_alias
(cherry-pick from commit 568c5fe5a54f2654f5a4c599c45b8a62ed9a2013)

Certain architectures may have the kernel image mapped separately to
alias the linear map. Introduce a macro lm_alias to translate a kernel
image symbol into its linear alias. This is used in part with work to
add CONFIG_DEBUG_VIRTUAL support for arm64.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Bug: 20045882
Change-Id: I93ba4ef8df77f7e52717ab6e208d06122ac3a72b
2017-11-20 15:13:55 -08:00
Kirill A. Shutemov
3d73e9927c mm: drop zap_details::ignore_dirty
The only user of ignore_dirty is oom-reaper.  But it doesn't really use
it.

ignore_dirty only has effect on file pages mapped with dirty pte.  But
oom-repear skips shared VMAs, so there's no way we can dirty file pte in
them.

Change-Id: Ife5733540fd14e5276a275d91fb46ee490b6aee0
Link: http://lkml.kernel.org/r/20170118122429.43661-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: da162e9368990ed747075e2ab427da0759fc4a59
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
2017-11-10 12:38:20 +05:30
Laura Abbott
1492b65a7b mm: Introduce lm_alias
Certain architectures may have the kernel image mapped separately to
alias the linear map. Introduce a macro lm_alias to translate a kernel
image symbol into its linear alias. This is used in part with work to
add CONFIG_DEBUG_VIRTUAL support for arm64.

Change-Id: I39059f57f725d4e352343170b8d84d2e668d3f62
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Git-commit: 568c5fe5a54f2654f5a4c599c45b8a62ed9a2013
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2017-08-07 17:31:12 -07:00
Linux Build Service Account
68406b9fd3 Merge "lowmemorykiller: Introduce sysfs node for ALMK and PPR adj threshold" 2017-07-21 12:25:22 -07:00
Linux Build Service Account
681ec1b153 Merge "mm: Update is_vmalloc_addr to account for vmalloc savings" 2017-07-20 13:39:49 -07:00
Vinayak Menon
9282168738 mm: enable page poisoning early at boot
On SPARSEMEM systems page poisoning is enabled after buddy is up, because
of the dependency on page extension init.  This causes the pages released
by free_all_bootmem not to be poisoned.  This either delays or misses the
identification of some issues because the pages have to undergo another
cycle of alloc-free-alloc for any corruption to be detected.

Enable page poisoning early by getting rid of the PAGE_EXT_DEBUG_POISON
flag.  Since all the free pages will now be poisoned, the flag need not be
verified before checking the poison during an alloc.

Link: http://lkml.kernel.org/r/1490358246-11001-1-git-send-email-vinmenon@codeaurora.org
Acked-by: Laura Abbott <labbott@redhat.com>
Tested-by: Laura Abbott <labbott@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[vinmenon@codeaurora.org: resolve trivial merge conflicts.
 Remove the redundant free pages RO feature from the
 page_poison.c file which is the reason for conflicts +
 squash the addendum commit 40961ef8d65f51093bc94de110b97b590b6b9275
 ('mm-enable-page-poisoning-early-at-boot-v2')]
Git-commit: c5b7cd344fd6341e6db79e55c0f1f4d1d9c67a7e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Change-Id: I1bb1f99d3a2e1135131911905e0916c837ba9d8a
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2017-07-20 11:17:26 +05:30
Vinayak Menon
1bddbcfede mm: process reclaim: vmpressure based process reclaim
With this patch, anon pages of inactive tasks can be reclaimed,
depending on memory pressure. Memory pressure is detected
using vmpressure events. 'N' best tasks in terms of anon
size is selected and pages proportional to their tasksize
is reclaimed. The total number of pages reclaimed at each
run of the swap work, can be tuned from userspace, the
default being SWAP_CLUSTER_MAX * 32.

The patch also adds tracepoints to debug and tune the
feature.

echo 1 > /sys/module/process_reclaim/parameters/enable_process_reclaim
to enable the feature.

echo <pages> > /sys/module/process_reclaim/parameters/per_swap_size,
to set the number of pages reclaimed in each scan.

/sys/module/process_reclaim/parameters/reclaim_avg_efficiency, provides
the average efficiency (scan to reclaim ratio) of the algorithm.

/sys/module/process_reclaim/parameters/swap_eff_win, to set the window
period (in unit of number of times reclaim is triggered) to detect
low efficiency runs.

/sys/module/process_reclaim/parameters/swap_opt_eff, to set the optimal
efficiency threshold for low efficiency detection.

Change-Id: I895986f10c997d1715761eaaadc4bbbee60db9d2
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2017-07-19 18:41:24 +05:30
Kyle Yan
2506cb5328 Merge remote-tracking branch '4.9/tmp-75d78c7' into 4.9
* 4.9/tmp-75d78c7:
  Linux 4.9.35
  brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2()
  jump label: fix passing kbuild_cflags when checking for asm goto support
  net: phy: fix marvell phy status reading
  spi: double time out tolerance
  dmaengine: bcm2835: Fix cyclic DMA period splitting
  net: phy: Initialize mdio clock at probe function
  rt2x00: avoid introducing a USB dependency in the rt2x00lib module
  usb: gadget: f_fs: avoid out of bounds access on comp_desc
  mtd: spi-nor: fix spansion quad enable
  of: Add check to of_scan_flat_dt() before accessing initial_boot_params
  rxrpc: Fix several cases where a padded len isn't checked in ticket decode
  drm/amdgpu: adjust default display clock
  drm/amdgpu/atom: fix ps allocation size for EnableDispPowerGating
  drm/radeon: add a quirk for Toshiba Satellite L20-183
  drm/radeon: add a PX quirk for another K53TK variant
  iscsi-target: Reject immediate data underflow larger than SCSI transfer length
  iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
  target: Fix kref->refcount underflow in transport_cmd_finish_abort
  arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
  time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  time: Fix clock->read(clock) race around clocksource changes
  brcmfmac: unbind all devices upon failure in firmware callback
  brcmfmac: use firmware callback upon failure to load
  brcmfmac: add parameter to pass error code in firmware callback
  Input: i8042 - add Fujitsu Lifebook AH544 to notimeout list
  powerpc/64s: Handle data breakpoints in Radix mode
  powerpc/kprobes: Pause function_graph tracing during jprobes handling
  signal: Only reschedule timers on signals timers have sent
  HID: Add quirk for Dell PIXART OEM mouse
  cxgb4: notify uP to route ctrlq compl to rdma rspq
  CIFS: Improve readdir verbosity
  KVM: PPC: Book3S HV: Context-switch EBB registers properly
  KVM: PPC: Book3S HV: Preserve userspace HTM state properly
  KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
  perf/x86/intel: Add 1G DTLB load/store miss support for SKL
  lib/cmdline.c: fix get_options() overflow while parsing ranges
  autofs: sanity check status reported with AUTOFS_DEV_IOCTL_FAIL
  powerpc/perf: Fix oops when kthread execs user process
  fs/exec.c: account for argv/envp pointers
  ALSA: pcm: Don't treat NULL chmap as a fatal error
  ALSA: firewire-lib: Fix stall of process context at packet error
  xen-blkback: don't leak stack data via response ring
  xen/blkback: fix disconnect while I/Os in flight
  clk: sunxi-ng: a31: Correct lcd1-ch1 clock register offset
  Revert "ANDROID: hardlockup: detect hard lockups without NMIs using secondary cpus"
  Revert "ANDROID: kernel/watchdog: fix unused variable warning"
  UPSTREAM: usb: gadget: f_fs: avoid out of bounds access on comp_desc
  Linux 4.9.34
  mm: fix new crash in unmapped_area_topdown()
  Allow stack to grow up to address space limit
  mm: larger stack guard gap, between vmas
  alarmtimer: Rate limit periodic intervals
  crypto: Work around deallocated stack frame reference gcc bug on sparc.
  vTPM: Fix missing NULL check
  MIPS: .its targets depend on vmlinux
  MIPS: Fix bnezc/jialc return address calculation
  usb: dwc3: exynos fix axius clock error path to do cleanup
  usb: gadget: composite: Fix function used to free memory
  alarmtimer: Prevent overflow of relative timers
  genirq: Release resources in __setup_irq() error path
  sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
  iio: imu: inv_mpu6050: add accel lpf setting for chip >= MPU6500
  swap: cond_resched in swap_cgroup_prepare()
  mm/memory-failure.c: use compound_head() flags for huge pages
  USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
  USB: gadget: fix GPF in gadgetfs
  usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
  usb: xhci: Fix USB 3.1 supported protocol parsing
  drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
  misc: mic: double free on ioctl error path
  ath10k: fix napi crash during rmmod when probe firmware fails
  usb: r8a66597-hcd: decrease timeout
  usb: r8a66597-hcd: select a different endpoint on timeout
  USB: gadget: dummy_hcd: fix hub-descriptor removable fields
  pvrusb2: reduce stack usage pvr2_eeprom_analyze()
  USB: usbip: fix nonconforming hub descriptor
  usb: core: fix potential memory leak in error path during hcd creation
  USB: hub: fix SS max number of ports
  usb: gadget: udc: renesas_usb3: lock for PN_ registers access
  usb: gadget: udc: renesas_usb3: fix deadlock by spinlock
  usb: gadget: udc: renesas_usb3: fix pm_runtime functions calling
  IB/mlx5: Fix kernel to user leak prevention logic
  iio: adc: ti_am335x_adc: allocating too much in probe
  iio: proximity: as3935: recalibrate RCO after resume
  iio: st_pressure: Fix data sign
  staging: iio: tsl2x7x_core: Fix standard deviation calculation
  staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
  mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
  x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
  serial: sh-sci: Fix late enablement of AUTORTS
  serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
  drm/vc4: Fix OOPSes from trying to cache a partially constructed BO.
  drm/mediatek: fix mtk_hdmi_setup_vendor_specific_infoframe mistake
  mac80211: don't send SMPS action frame in AP mode when not needed
  mac80211: fix dropped counter in multiqueue RX
  mac80211: strictly check mesh address extension mode
  mac80211: fix IBSS presp allocation size
  mac80211: fix packet statistics for fast-RX
  mac80211: fix CSA in IBSS mode
  usb: musb: dsps: keep VBUS on for host-only mode
  drm/i915: Fix GVT-g PVINFO version compatibility check
  drm/amdgpu: Fix overflow of watermark calcs at > 4k resolutions.
  mac80211/wpa: use constant time memory comparison for MACs
  mac80211: don't look at the PM bit of BAR frames
  vb2: Fix an off by one error in 'vb2_plane_vaddr'
  cpufreq: conservative: Allow down_threshold to take values from 1 to 10
  ila_xlat: add missing hash secret initialization
  can: gs_usb: fix memory leak in gs_cmd_reset()
  configfs: Fix race between create_link and configfs_rmdir
  fs: pass on flags in compat_writev
  ANDROID: sdcardfs: remove dead function open_flags_to_access_mode()
  ANDROID: android-base.cfg: split out arm64-specific configs
  Linux 4.9.33
  sparc64: make string buffers large enough
  drm/i915: Always recompute watermarks when distrust_bios_wm is set, v2.
  drm/i915: Workaround VLV/CHV DSI scanline counter hardware fail
  s390/kvm: do not rely on the ILC on kvm host protection fauls
  xtensa: don't use linux IRQ #0
  RDMA/qedr: Return max inline data in QP query result
  RDMA/qedr: Don't spam dmesg if QP is in error state
  RDMA/qedr: Don't reset QP when queues aren't flushed
  RDMA/qedr: Fix and simplify memory leak in PD alloc
  RDMA/qedr: Dispatch port active event from qedr_add
  netfilter: nft_log: restrict the log prefix length to 127
  netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL
  tipc: fix nametbl_lock soft lockup at node/link events
  tipc: add subscription refcount to avoid invalid delete
  tipc: fix connection refcount error
  tipc: ignore requests when the connection state is not CONNECTED
  ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncached
  ARC: smp-boot: Decouple Non masters waiting API from jump to entry point
  vhost/vsock: handle vhost_vq_init_access() error
  kernel/watchdog: prevent false hardlockup on overloaded system
  kernel/watchdog.c: move shared definitions to nmi.h
  kernel/watchdog.c: move hardlockup detector to separate file
  userfaultfd: fix SIGBUS resulting from false rwsem wakeups
  proc: add a schedule point in proc_pid_readdir()
  frv: add missing atomic64 operations
  frv: add atomic64_add_unless()
  romfs: use different way to generate fsid for BLOCK or MTD
  mn10300: fix build error of missing fpu_save()
  usb: musb: Fix external abort on non-linefetch for musb_irq_work()
  sctp: sctp_addr_id2transport should verify the addr before looking up assoc
  sctp: sctp gso should set feature with NETIF_F_SG when calling skb_segment
  bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
  bnxt_en: Fix RTNL lock usage on bnxt_update_link().
  bnxt_en: Enhance autoneg support.
  bnxt_en: Fix bnxt_reset() in the slow path task.
  net-next: ethernet: mediatek: change the compatible string
  r8152: avoid start_xmit to schedule napi when napi is disabled
  r8152: fix rtl8152_post_reset function
  r8152: re-schedule napi for tx
  r8152: check rx after napi is enabled
  r8152: avoid start_xmit to call napi_schedule during autosuspend
  nvmet-rdma: Fix missing dma sync to nvme data structures
  nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
  ravb: unmap descriptors when freeing rings
  drm/ast: Fixed system hanged if disable P2A
  drm/nouveau: Fix drm poll_helper handling
  drm/nouveau: Don't enabling polling twice on runtime resume
  drm/nouveau: Handle fbcon suspend/resume in seperate worker
  drm/nouveau: Rename acpi_work to hpd_work
  drm/nouveau: Intercept ACPI_VIDEO_NOTIFY_PROBE
  gtp: add genl family modules alias
  net: phy: micrel: add support for KSZ8795
  parisc, parport_gsc: Fixes for printk continuation lines
  net/mlx5: Return EOPNOTSUPP when failing to get steering name-space
  net/mlx5: E-Switch, Err when retrieving steering name-space fails
  drm/i915: Check for NULL i915_vma in intel_unpin_fb_obj()
  net: adaptec: starfire: add checks for dma mapping errors
  pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
  drm: Don't race connector registration
  drm: prevent double-(un)registration for connectors
  cec: fix wrong last_la determination
  pinctrl: baytrail: Rectify debounce support (part 2)
  gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
  net/mlx4_core: Avoid command timeouts during VF driver device shutdown
  drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
  drm/nouveau: prevent userspace from deleting client object
  ipv6: fix flow labels when the traffic class is non-0
  FS-Cache: Initialise stores_lock in netfs cookie
  fscache: Clear outstanding writes when disabling a cookie
  fscache: Fix dead object requeue
  net: fix ndo_features_check/ndo_fix_features comment ordering
  net: phy: Fix PHY module checks and NULL deref in phy_attach_direct()
  net: phy: Fix lack of reference count on PHY driver
  ethtool: do not vzalloc(0) on registers dump
  log2: make order_base_2() behave correctly on const input value zero
  kasan: respect /proc/sys/kernel/traceoff_on_warning
  shmem: fix sleeping from atomic context
  jump label: pass kbuild_cflags when checking for asm goto support
  PM / runtime: Avoid false-positive warnings from might_sleep_if()
  ARM: defconfigs: make NF_CT_PROTO_SCTP and NF_CT_PROTO_UDPLITE built-in
  ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
  vfio/spapr_tce: Set window when adding additional groups to container
  ipv6: addrconf: fix generation of new temporary addresses
  net: thunderx: Fix PHY autoneg for SGMII QLM mode
  kernel/ucount.c: mark user_header with kmemleak_ignore()
  powerpc/powernv: Properly set "host-ipi" on IPIs
  i2c: piix4: Fix request_region size
  i2c: piix4: Request the SMBUS semaphore inside the mutex
  sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
  sierra_net: Skip validating irrelevant fields for IDLE LSIs
  net: hns: Fix the device being used for dma mapping during TX
  NET: mkiss: Fix panic
  ibmvnic: Initialize completion variables before starting work
  ibmvnic: Call napi_disable instead of napi_enable in failure path
  NET: Fix /proc/net/arp for AX.25
  gfs2: Use rhashtable walk interface in glock_hash_walk
  tipc: Fix tipc_sk_reinit race conditions
  ipv6: Inhibit IPv4-mapped src address on the wire.
  ipv6: Handle IPv4-mapped src to in6addr_any dst.
  tcp: tcp_probe: use spin_lock_bh()
  net: xilinx_emaclite: fix receive buffer overflow
  net: xilinx_emaclite: fix freezes due to unordered I/O
  ibmvnic: Fix endian error when requesting device capabilities
  ibmvnic: Fix endian errors in error reporting output
  netfilter: nf_conntrack_sip: fix wrong memory initialisation
  partitions/msdos: FreeBSD UFS2 file systems are not recognized
  drm/i915: Prevent the system suspend complete optimization
  PCI/PM: Add needs_resume flag to avoid suspend complete optimization
  usb: gadget: f_fs: Fix possibe deadlock
  FROMLIST: Remove the redundant skb->dev initialization in ip6_fragment
  FROMLIST: bpf: Remove duplicate tcp_filter hook in ipv6
  FROMLIST: ipv6: Initial skb->dev and skb->protocol in ip6_output
  FROMLIST: bpf: cgroup skb progs cannot access ld_abs/ind
  ANDROID: uid_sys_stats: check previous uid_entry before call find_or_register_uid

Conflicts:
	drivers/usb/gadget/function/f_fs.c
	kernel/watchdog.c

Change-Id: Ib91e6121cace8bcdc4dd8b1727792dd5bb6f1535
Signed-off-by: Kyle Yan <kyan@codeaurora.org>
2017-07-05 13:17:33 -07:00
Susheel Khiani
8db21e1d69 mm: Update is_vmalloc_addr to account for vmalloc savings
is_vmalloc_addr currently assumes that all vmalloc addresses
exist between VMALLOC_START and VMALLOC_END. This may not be
the case when interleaving vmalloc and lowmem. Update the
is_vmalloc_addr to properly check for this.

Correspondingly we need to ensure that VMALLOC_TOTAL accounts
for all the vmalloc regions when CONFIG_ENABLE_VMALLOC_SAVING
is enabled.

Change-Id: I5def3d6ae1a4de59ea36f095b8c73649a37b1f36
Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>
2017-07-04 16:51:51 +05:30
Greg Kroah-Hartman
7172a93a70 Merge 4.9.34 into android-4.9
Changes in 4.9.34
	fs: pass on flags in compat_writev
	configfs: Fix race between create_link and configfs_rmdir
	can: gs_usb: fix memory leak in gs_cmd_reset()
	ila_xlat: add missing hash secret initialization
	cpufreq: conservative: Allow down_threshold to take values from 1 to 10
	vb2: Fix an off by one error in 'vb2_plane_vaddr'
	mac80211: don't look at the PM bit of BAR frames
	mac80211/wpa: use constant time memory comparison for MACs
	drm/amdgpu: Fix overflow of watermark calcs at > 4k resolutions.
	drm/i915: Fix GVT-g PVINFO version compatibility check
	usb: musb: dsps: keep VBUS on for host-only mode
	mac80211: fix CSA in IBSS mode
	mac80211: fix packet statistics for fast-RX
	mac80211: fix IBSS presp allocation size
	mac80211: strictly check mesh address extension mode
	mac80211: fix dropped counter in multiqueue RX
	mac80211: don't send SMPS action frame in AP mode when not needed
	drm/mediatek: fix mtk_hdmi_setup_vendor_specific_infoframe mistake
	drm/vc4: Fix OOPSes from trying to cache a partially constructed BO.
	serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
	serial: sh-sci: Fix late enablement of AUTORTS
	x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
	mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
	staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
	staging: iio: tsl2x7x_core: Fix standard deviation calculation
	iio: st_pressure: Fix data sign
	iio: proximity: as3935: recalibrate RCO after resume
	iio: adc: ti_am335x_adc: allocating too much in probe
	IB/mlx5: Fix kernel to user leak prevention logic
	usb: gadget: udc: renesas_usb3: fix pm_runtime functions calling
	usb: gadget: udc: renesas_usb3: fix deadlock by spinlock
	usb: gadget: udc: renesas_usb3: lock for PN_ registers access
	USB: hub: fix SS max number of ports
	usb: core: fix potential memory leak in error path during hcd creation
	USB: usbip: fix nonconforming hub descriptor
	pvrusb2: reduce stack usage pvr2_eeprom_analyze()
	USB: gadget: dummy_hcd: fix hub-descriptor removable fields
	usb: r8a66597-hcd: select a different endpoint on timeout
	usb: r8a66597-hcd: decrease timeout
	ath10k: fix napi crash during rmmod when probe firmware fails
	misc: mic: double free on ioctl error path
	drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
	usb: xhci: Fix USB 3.1 supported protocol parsing
	usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
	USB: gadget: fix GPF in gadgetfs
	USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
	mm/memory-failure.c: use compound_head() flags for huge pages
	swap: cond_resched in swap_cgroup_prepare()
	iio: imu: inv_mpu6050: add accel lpf setting for chip >= MPU6500
	sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
	genirq: Release resources in __setup_irq() error path
	alarmtimer: Prevent overflow of relative timers
	usb: gadget: composite: Fix function used to free memory
	usb: dwc3: exynos fix axius clock error path to do cleanup
	MIPS: Fix bnezc/jialc return address calculation
	MIPS: .its targets depend on vmlinux
	vTPM: Fix missing NULL check
	crypto: Work around deallocated stack frame reference gcc bug on sparc.
	alarmtimer: Rate limit periodic intervals
	mm: larger stack guard gap, between vmas
	Allow stack to grow up to address space limit
	mm: fix new crash in unmapped_area_topdown()
	Linux 4.9.34

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-27 10:43:56 +02:00
Hugh Dickins
cfc0eb4038 mm: larger stack guard gap, between vmas
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-24 07:11:18 +02:00
Lee Susman
001f314ba9 mm: change max readahead size to 512KB
Change the VM_MAX_READAHEAD value from the default 128KB
to 512KB. This will allow the readahead window to grow to a maximum size
of 512KB, which greatly benefits to sequential read throughput.

Change-Id: Ia0780ea4e2a4ae0b6111485b72fb25376dcb1f96
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2017-03-21 18:42:33 -07:00
Dmitry Shmidt
cd08287396 Merge tag 'v4.9.6' into android-4.9
This is the 4.9.6 stable release

Change-Id: I318df4b9d706d50c13fe3969d734117c25fc94bc
2017-01-31 13:55:27 -08:00
Colin Cross
3e4578f42f ANDROID: mm: add a field to store names for private anonymous memory
Userspace processes often have multiple allocators that each do
anonymous mmaps to get memory.  When examining memory usage of
individual processes or systems as a whole, it is useful to be
able to break down the various heaps that were allocated by
each layer and examine their size, RSS, and physical memory
usage.

This patch adds a user pointer to the shared union in
vm_area_struct that points to a null terminated string inside
the user process containing a name for the vma.  vmas that
point to the same address will be merged, but vmas that
point to equivalent strings at different addresses will
not be merged.

Userspace can set the name for a region of memory by calling
prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name);
Setting the name to NULL clears it.

The names of named anonymous vmas are shown in /proc/pid/maps
as [anon:<name>] and in /proc/pid/smaps in a new "Name" field
that is only present for named vmas.  If the userspace pointer
is no longer valid all or part of the name will be replaced
with "<fault>".

The idea to store a userspace pointer to reduce the complexity
within mm (at the expense of the complexity of reading
/proc/pid/mem) came from Dave Hansen.  This results in no
runtime overhead in the mm subsystem other than comparing
the anon_name pointers when considering vma merging.  The pointer
is stored in a union with fieds that are only used on file-backed
mappings, so it does not increase memory usage.

Includes fix from Jed Davis <jld@mozilla.com> for typo in
prctl_set_vma_anon_name, which could attempt to set the name
across two vmas at the same time due to a typo, which might
corrupt the vma list.  Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.

Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2017-01-27 13:52:21 -08:00
Eric W. Biederman
e71b4e061c ptrace: Don't allow accessing an undumpable mm
commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3 upstream.

It is the reasonable expectation that if an executable file is not
readable there will be no way for a user without special privileges to
read the file.  This is enforced in ptrace_attach but if ptrace
is already attached before exec there is no enforcement for read-only
executables.

As the only way to read such an mm is through access_process_vm
spin a variant called ptrace_access_vm that will fail if the
target process is not being ptraced by the current process, or
the current process did not have sufficient privileges when ptracing
began to read the target processes mm.

In the ptrace implementations replace access_process_vm by
ptrace_access_vm.  There remain several ptrace sites that still use
access_process_vm as they are reading the target executables
instructions (for kernel consumption) or register stacks.  As such it
does not appear necessary to add a permission check to those calls.

This bug has always existed in Linux.

Fixes: v1.0
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-06 10:40:13 +01:00
John Stultz
6f19367634 ANDROID: ashmem: Add shmem_set_file to mm/shmem.c
NOT FOR STAGING
This patch re-adds the original shmem_set_file to mm/shmem.c
and converts ashmem.c back to using it.

Change-Id: Ie604c9f8f4d0ee6bc2aae1a96d261c8373a1a2dc
CC: Brian Swetland <swetland@google.com>
CC: Colin Cross <ccross@android.com>
CC: Arve Hjønnevåg <arve@android.com>
CC: Dima Zavin <dima@android.com>
CC: Robert Love <rlove@google.com>
CC: Greg KH <greg@kroah.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-12-14 14:34:12 -08:00
Lorenzo Stoakes
0d73175982 mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.

Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.

We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:

  get_user_page_nowait():
    get_user_pages(start, 1, flags, page, NULL)
    __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, start, 1,
		     flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

  check_user_page_hwpoison():
    get_user_pages(addr, 1, flags, NULL, NULL)
    __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
		     NULL, NULL)

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 19:13:20 -07:00
Linus Torvalds
86c5bf7101 Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vmap stack fixes from Ingo Molnar:
 "This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
  accesses that used to be just somewhat questionable are now totally
  buggy.

  These changes try to do it without breaking the ABI: the fields are
  left there, they are just reporting zero, or reporting narrower
  information (the maps file change)"

* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
  fs/proc: Stop trying to report thread stacks
  fs/proc: Stop reporting eip and esp in /proc/PID/stat
  mm/numa: Remove duplicated include from mprotect.c
2016-10-22 09:39:10 -07:00
Andy Lutomirski
d17af5056c mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode.  As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.

The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:21:41 +02:00
Lorenzo Stoakes
f307ab6dce mm: replace access_process_vm() write parameter with gup_flags
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:31:25 -07:00
Lorenzo Stoakes
6347e8d5bc mm: replace access_remote_vm() write parameter with gup_flags
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:14 -07:00
Lorenzo Stoakes
9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Lorenzo Stoakes
768ae309a9 mm: replace get_user_pages() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:43 -07:00
Lorenzo Stoakes
7f23b3504a mm: replace get_vaddr_frames() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:24 -07:00
Lorenzo Stoakes
3b913179c3 mm: replace get_user_pages_locked() write/force parameters with gup_flags
This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:05 -07:00
Lorenzo Stoakes
c164154f66 mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18 14:13:37 -07:00
Lorenzo Stoakes
d4944b0ece mm: remove write/force parameters from __get_user_pages_unlocked()
This removes the redundant 'write' and 'force' parameters from
__get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18 14:13:37 -07:00
Linus Torvalds
19be0eaffa mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db975 ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better).  The
s390 dirty bit was implemented in abf09bed3c ("s390/mm: implement
software dirty bits") which made it into v3.9.  Earlier kernels will
have to look at the page state itself.

Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18 14:13:29 -07:00
zijun_hu
1061b0d21e linux/mm.h: canonicalize macro PAGE_ALIGNED() definition
The macro PAGE_ALIGNED() is prone to cause error because it doesn't
follow convention to parenthesize parameter @addr within macro body, for
example unsigned long *ptr = kmalloc(...); PAGE_ALIGNED(ptr + 16); for
the left parameter of macro IS_ALIGNED(), (unsigned long)(ptr + 16) is
desired but the actual one is (unsigned long)ptr + 16.

It is fixed by simply canonicalizing macro PAGE_ALIGNED() definition.

Link: http://lkml.kernel.org/r/57EA6AE7.7090807@zoho.com
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:30 -07:00
Michal Hocko
7877cdcc38 mm: consolidate warn_alloc_failed users
warn_alloc_failed is currently used from the page and vmalloc
allocators.  This is a good reuse of the code except that vmalloc would
appreciate a slightly different warning message.  This is already
handled by the fmt parameter except that

  "%s: page allocation failure: order:%u, mode:%#x(%pGg)"

is printed anyway.  This might be quite misleading because it might be a
vmalloc failure which leads to the warning while the page allocator is
not the culprit here.  Fix this by always using the fmt string and only
print the context that makes sense for the particular context (e.g.
order makes only very little sense for the vmalloc context).

Rename the function to not miss any user and also because a later patch
will reuse it also for !failure cases.

Link: http://lkml.kernel.org/r/20160929084407.7004-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
e86f15ee64 mm: vma_merge: fix vm_page_prot SMP race condition against rmap_walk
The rmap_walk can access vm_page_prot (and potentially vm_flags in the
pte/pmd manipulations).  So it's not safe to wait the caller to update
the vm_page_prot/vm_flags after vma_merge returned potentially removing
the "next" vma and extending the "current" vma over the
next->vm_start,vm_end range, but still with the "current" vma
vm_page_prot, after releasing the rmap locks.

The vm_page_prot/vm_flags must be transferred from the "next" vma to the
current vma while vma_merge still holds the rmap locks.

The side effect of this race condition is pte corruption during migrate
as remove_migration_ptes when run on a address of the "next" vma that
got removed, used the vm_page_prot of the current vma.

  migrate   	      	        mprotect
  ------------			-------------
  migrating in "next" vma
				vma_merge() # removes "next" vma and
			        	    # extends "current" vma
					    # current vma is not with
					    # vm_page_prot updated
  remove_migration_ptes
  read vm_page_prot of current "vma"
  establish pte with wrong permissions
				vm_set_page_prot(vma) # too late!
				change_protection in the old vma range
				only, next range is not updated

This caused segmentation faults and potentially memory corruption in
heavy mprotect loads with some light page migration caused by compaction
in the background.

Hugh Dickins pointed out the comment about the Odd case 8 in vma_merge
which confirms the case 8 is only buggy one where the race can trigger,
in all other vma_merge cases the above cannot happen.

This fix removes the oddness factor from case 8 and it converts it from:

      AAAA
  PPPPNNNNXXXX -> PPPPNNNNNNNN

to:

      AAAA
  PPPPNNNNXXXX -> PPPPXXXXXXXX

XXXX has the right vma properties for the whole merged vma returned by
vma_adjust, so it solves the problem fully.  It has the added benefits
that the callers could stop updating vma properties when vma_merge
succeeds however the callers are not updated by this patch (there are
bits like VM_SOFTDIRTY that still need special care for the whole range,
as the vma merging ignores them, but as long as they're not processed by
rmap walks and instead they're accessed with the mmap_sem at least for
reading, they are fine not to be updated within vma_adjust before
releasing the rmap_locks).

Link: http://lkml.kernel.org/r/1474309513-20313-1-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Aditya Mandaleeka <adityam@microsoft.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Andrea Arcangeli
6d2329f887 mm: vm_page_prot: update with WRITE_ONCE/READ_ONCE
vma->vm_page_prot is read lockless from the rmap_walk, it may be updated
concurrently and this prevents the risk of reading intermediate values.

Link: http://lkml.kernel.org/r/1474660305-19222-1-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Vorlicek <janvorli@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:29 -07:00
Huang Ying
8cd797887a mm: remove page_file_index
After using the offset of the swap entry as the key of the swap cache,
the page_index() becomes exactly same as page_file_index().  So the
page_file_index() is removed and the callers are changed to use
page_index() instead.

Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
Huang Ying
f6ab1f7f6b mm, swap: use offset of swap entry as key of swap cache
This patch is to improve the performance of swap cache operations when
the type of the swap device is not 0.  Originally, the whole swap entry
value is used as the key of the swap cache, even though there is one
radix tree for each swap device.  If the type of the swap device is not
0, the height of the radix tree of the swap cache will be increased
unnecessary, especially on 64bit architecture.  For example, for a 1GB
swap device on the x86_64 architecture, the height of the radix tree of
the swap cache is 11.  But if the offset of the swap entry is used as
the key of the swap cache, the height of the radix tree of the swap
cache is 4.  The increased height causes unnecessary radix tree
descending and increased cache footprint.

This patch reduces the height of the radix tree of the swap cache via
using the offset of the swap entry instead of the whole swap entry value
as the key of the swap cache.  In 32 processes sequential swap out test
case on a Xeon E5 v3 system with RAM disk as swap, the lock contention
for the spinlock of the swap cache is reduced from 20.15% to 12.19%,
when the type of the swap device is 1.

Use the whole swap entry as key,

  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 10.37,
  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 9.78,

Use the swap offset as key,

  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_swap.shrink_page_list: 6.25,
  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg: 5.94,

Link: http://lkml.kernel.org/r/1473270649-27229-1-git-send-email-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
Srikar Dronamraju
f6f34b4387 mm: introduce arch_reserved_kernel_pages()
Currently arch specific code can reserve memory blocks but
alloc_large_system_hash() may not take it into consideration when sizing
the hashes.  This can lead to bigger hash than required and lead to no
available memory for other purposes.  This is specifically true for
systems with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.

One approach to solve this problem would be to walk through the memblock
regions and calculate the available memory and base the size of hash
system on the available memory.

The other approach would be to depend on the architecture to provide the
number of pages that are reserved.  This change provides hooks to allow
the architecture to provide the required info.

Link: http://lkml.kernel.org/r/1472476010-4709-2-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:28 -07:00
James Morse
f7e2355f0f mm: pagewalk: fix the comment for test_walk
Modify the comment describing struct mm_walk->test_walk()s behaviour to
match the comment on walk_page_test() and the behaviour of
walk_page_vma().

Fixes: fafaa4264e ("pagewalk: improve vma handling")
Link: http://lkml.kernel.org/r/1471622518-21980-1-git-send-email-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-07 18:46:27 -07:00
Dmitry Safonov
2eefd87896 x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*
Add API to change vdso blob type with arch_prctl.
As this is usefull only by needs of CRIU, expose
this interface under CONFIG_CHECKPOINT_RESTORE.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: oleg@redhat.com
Cc: linux-mm@kvack.org
Cc: gorcunov@openvz.org
Cc: xemul@virtuozzo.com
Link: http://lkml.kernel.org/r/20160905133308.28234-4-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-14 21:28:09 +02:00
Linus Torvalds
511a8cdb65 Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
 "Two small patches to fix some bugs with the audit-by-executable
  functionality we introduced back in v4.3 (both patches are marked
  for the stable folks)"

* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
  audit: fix exe_file access in audit_exe_compare
  mm: introduce get_task_exe_file
2016-09-01 15:55:56 -07:00
Mateusz Guzik
cd81a9170e mm: introduce get_task_exe_file
For more convenient access if one has a pointer to the task.

As a minor nit take advantage of the fact that only task lock + rcu are
needed to safely grab ->exe_file. This saves mm refcount dance.

Use the helper in proc_exe_link.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Cc: <stable@vger.kernel.org> # 4.3.x
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-08-31 16:11:20 -04:00
Mel Gorman
75ef718405 mm, vmstat: add infrastructure for per-node vmstats
Patchset: "Move LRU page reclaim from zones to nodes v9"

This series moves LRUs from the zones to the node.  While this is a
current rebase, the test results were based on mmotm as of June 23rd.
Conceptually, this series is simple but there are a lot of details.
Some of the broad motivations for this are;

1. The residency of a page partially depends on what zone the page was
   allocated from.  This is partially combatted by the fair zone allocation
   policy but that is a partial solution that introduces overhead in the
   page allocator paths.

2. Currently, reclaim on node 0 behaves slightly different to node 1. For
   example, direct reclaim scans in zonelist order and reclaims even if
   the zone is over the high watermark regardless of the age of pages
   in that LRU. Kswapd on the other hand starts reclaim on the highest
   unbalanced zone. A difference in distribution of file/anon pages due
   to when they were allocated results can result in a difference in
   again. While the fair zone allocation policy mitigates some of the
   problems here, the page reclaim results on a multi-zone node will
   always be different to a single-zone node.
   it was scheduled on as a result.

3. kswapd and the page allocator scan zones in the opposite order to
   avoid interfering with each other but it's sensitive to timing.  This
   mitigates the page allocator using pages that were allocated very recently
   in the ideal case but it's sensitive to timing. When kswapd is allocating
   from lower zones then it's great but during the rebalancing of the highest
   zone, the page allocator and kswapd interfere with each other. It's worse
   if the highest zone is small and difficult to balance.

4. slab shrinkers are node-based which makes it harder to identify the exact
   relationship between slab reclaim and LRU reclaim.

The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.

Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes.

The series has been tested on a 16 core UMA machine and a 2-socket 48
core NUMA machine. The UMA results are presented in most cases as the NUMA
machine behaved similarly.

pagealloc
---------

This is a microbenchmark that shows the benefit of removing the fair zone
allocation policy. It was tested uip to order-4 but only orders 0 and 1 are
shown as the other orders were comparable.

                                           4.7.0-rc4                  4.7.0-rc4
                                      mmotm-20160623                 nodelru-v9
Min      total-odr0-1               490.00 (  0.00%)           457.00 (  6.73%)
Min      total-odr0-2               347.00 (  0.00%)           329.00 (  5.19%)
Min      total-odr0-4               288.00 (  0.00%)           273.00 (  5.21%)
Min      total-odr0-8               251.00 (  0.00%)           239.00 (  4.78%)
Min      total-odr0-16              234.00 (  0.00%)           222.00 (  5.13%)
Min      total-odr0-32              223.00 (  0.00%)           211.00 (  5.38%)
Min      total-odr0-64              217.00 (  0.00%)           208.00 (  4.15%)
Min      total-odr0-128             214.00 (  0.00%)           204.00 (  4.67%)
Min      total-odr0-256             250.00 (  0.00%)           230.00 (  8.00%)
Min      total-odr0-512             271.00 (  0.00%)           269.00 (  0.74%)
Min      total-odr0-1024            291.00 (  0.00%)           282.00 (  3.09%)
Min      total-odr0-2048            303.00 (  0.00%)           296.00 (  2.31%)
Min      total-odr0-4096            311.00 (  0.00%)           309.00 (  0.64%)
Min      total-odr0-8192            316.00 (  0.00%)           314.00 (  0.63%)
Min      total-odr0-16384           317.00 (  0.00%)           315.00 (  0.63%)
Min      total-odr1-1               742.00 (  0.00%)           712.00 (  4.04%)
Min      total-odr1-2               562.00 (  0.00%)           530.00 (  5.69%)
Min      total-odr1-4               457.00 (  0.00%)           433.00 (  5.25%)
Min      total-odr1-8               411.00 (  0.00%)           381.00 (  7.30%)
Min      total-odr1-16              381.00 (  0.00%)           356.00 (  6.56%)
Min      total-odr1-32              372.00 (  0.00%)           346.00 (  6.99%)
Min      total-odr1-64              372.00 (  0.00%)           343.00 (  7.80%)
Min      total-odr1-128             375.00 (  0.00%)           351.00 (  6.40%)
Min      total-odr1-256             379.00 (  0.00%)           351.00 (  7.39%)
Min      total-odr1-512             385.00 (  0.00%)           355.00 (  7.79%)
Min      total-odr1-1024            386.00 (  0.00%)           358.00 (  7.25%)
Min      total-odr1-2048            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-4096            390.00 (  0.00%)           362.00 (  7.18%)
Min      total-odr1-8192            388.00 (  0.00%)           363.00 (  6.44%)

This shows a steady improvement throughout. The primary benefit is from
reduced system CPU usage which is obvious from the overall times;

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623nodelru-v8
User          189.19      191.80
System       2604.45     2533.56
Elapsed      2855.30     2786.39

The vmstats also showed that the fair zone allocation policy was definitely
removed as can be seen here;

                             4.7.0-rc3   4.7.0-rc3
                         mmotm-20160623 nodelru-v8
DMA32 allocs               28794729769           0
Normal allocs              48432501431 77227309877
Movable allocs                       0           0

tiobench on ext4
----------------

tiobench is a benchmark that artifically benefits if old pages remain resident
while new pages get reclaimed. The fair zone allocation policy mitigates this
problem so pages age fairly. While the benchmark has problems, it is important
that tiobench performance remains constant as it implies that page aging
problems that the fair zone allocation policy fixes are not re-introduced.

                                         4.7.0-rc4             4.7.0-rc4
                                    mmotm-20160623            nodelru-v9
Min      PotentialReadSpeed        89.65 (  0.00%)       90.21 (  0.62%)
Min      SeqRead-MB/sec-1          82.68 (  0.00%)       82.01 ( -0.81%)
Min      SeqRead-MB/sec-2          72.76 (  0.00%)       72.07 ( -0.95%)
Min      SeqRead-MB/sec-4          75.13 (  0.00%)       74.92 ( -0.28%)
Min      SeqRead-MB/sec-8          64.91 (  0.00%)       65.19 (  0.43%)
Min      SeqRead-MB/sec-16         62.24 (  0.00%)       62.22 ( -0.03%)
Min      RandRead-MB/sec-1          0.88 (  0.00%)        0.88 (  0.00%)
Min      RandRead-MB/sec-2          0.95 (  0.00%)        0.92 ( -3.16%)
Min      RandRead-MB/sec-4          1.43 (  0.00%)        1.34 ( -6.29%)
Min      RandRead-MB/sec-8          1.61 (  0.00%)        1.60 ( -0.62%)
Min      RandRead-MB/sec-16         1.80 (  0.00%)        1.90 (  5.56%)
Min      SeqWrite-MB/sec-1         76.41 (  0.00%)       76.85 (  0.58%)
Min      SeqWrite-MB/sec-2         74.11 (  0.00%)       73.54 ( -0.77%)
Min      SeqWrite-MB/sec-4         80.05 (  0.00%)       80.13 (  0.10%)
Min      SeqWrite-MB/sec-8         72.88 (  0.00%)       73.20 (  0.44%)
Min      SeqWrite-MB/sec-16        75.91 (  0.00%)       76.44 (  0.70%)
Min      RandWrite-MB/sec-1         1.18 (  0.00%)        1.14 ( -3.39%)
Min      RandWrite-MB/sec-2         1.02 (  0.00%)        1.03 (  0.98%)
Min      RandWrite-MB/sec-4         1.05 (  0.00%)        0.98 ( -6.67%)
Min      RandWrite-MB/sec-8         0.89 (  0.00%)        0.92 (  3.37%)
Min      RandWrite-MB/sec-16        0.92 (  0.00%)        0.93 (  1.09%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 approx-v9
User          645.72      525.90
System        403.85      331.75
Elapsed      6795.36     6783.67

This shows that the series has little or not impact on tiobench which is
desirable and a reduction in system CPU usage. It indicates that the fair
zone allocation policy was removed in a manner that didn't reintroduce
one class of page aging bug. There were only minor differences in overall
reclaim activity

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Minor Faults                    645838      647465
Major Faults                       573         640
Swap Ins                             0           0
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                  46041453    44190646
Normal allocs                 78053072    79887245
Movable allocs                       0           0
Allocation stalls                   24          67
Stall zone DMA                       0           0
Stall zone DMA32                     0           0
Stall zone Normal                    0           2
Stall zone HighMem                   0           0
Stall zone Movable                   0          65
Direct pages scanned             10969       30609
Kswapd pages scanned          93375144    93492094
Kswapd pages reclaimed        93372243    93489370
Direct pages reclaimed           10969       30609
Kswapd efficiency                  99%         99%
Kswapd velocity              13741.015   13781.934
Direct efficiency                 100%        100%
Direct velocity                  1.614       4.512
Percentage direct scans             0%          0%

kswapd activity was roughly comparable. There were differences in direct
reclaim activity but negligible in the context of the overall workload
(velocity of 4 pages per second with the patches applied, 1.6 pages per
second in the baseline kernel).

pgbench read-only large configuration on ext4
---------------------------------------------

pgbench is a database benchmark that can be sensitive to page reclaim
decisions. This also checks if removing the fair zone allocation policy
is safe

pgbench Transactions
                        4.7.0-rc4             4.7.0-rc4
                   mmotm-20160623            nodelru-v8
Hmean    1       188.26 (  0.00%)      189.78 (  0.81%)
Hmean    5       330.66 (  0.00%)      328.69 ( -0.59%)
Hmean    12      370.32 (  0.00%)      380.72 (  2.81%)
Hmean    21      368.89 (  0.00%)      369.00 (  0.03%)
Hmean    30      382.14 (  0.00%)      360.89 ( -5.56%)
Hmean    32      428.87 (  0.00%)      432.96 (  0.95%)

Negligible differences again. As with tiobench, overall reclaim activity
was comparable.

bonnie++ on ext4
----------------

No interesting performance difference, negligible differences on reclaim
stats.

paralleldd on ext4
------------------

This workload uses varying numbers of dd instances to read large amounts of
data from disk.

                               4.7.0-rc3             4.7.0-rc3
                          mmotm-20160623            nodelru-v9
Amean    Elapsd-1       186.04 (  0.00%)      189.41 ( -1.82%)
Amean    Elapsd-3       192.27 (  0.00%)      191.38 (  0.46%)
Amean    Elapsd-5       185.21 (  0.00%)      182.75 (  1.33%)
Amean    Elapsd-7       183.71 (  0.00%)      182.11 (  0.87%)
Amean    Elapsd-12      180.96 (  0.00%)      181.58 ( -0.35%)
Amean    Elapsd-16      181.36 (  0.00%)      183.72 ( -1.30%)

           4.7.0-rc4   4.7.0-rc4
        mmotm-20160623 nodelru-v9
User         1548.01     1552.44
System       8609.71     8515.08
Elapsed      3587.10     3594.54

There is little or no change in performance but some drop in system CPU usage.

                             4.7.0-rc3   4.7.0-rc3
                        mmotm-20160623  nodelru-v9
Minor Faults                    362662      367360
Major Faults                      1204        1143
Swap Ins                            22           0
Swap Outs                         2855        1029
DMA allocs                           0           0
DMA32 allocs                  31409797    28837521
Normal allocs                 46611853    49231282
Movable allocs                       0           0
Direct pages scanned                 0           0
Kswapd pages scanned          40845270    40869088
Kswapd pages reclaimed        40830976    40855294
Direct pages reclaimed               0           0
Kswapd efficiency                  99%         99%
Kswapd velocity              11386.711   11369.769
Direct efficiency                 100%        100%
Direct velocity                  0.000       0.000
Percentage direct scans             0%          0%
Page writes by reclaim            2855        1029
Page writes file                     0           0
Page writes anon                  2855        1029
Page reclaim immediate             771        1628
Sector Reads                 293312636   293536360
Sector Writes                 18213568    18186480
Page rescued immediate               0           0
Slabs scanned                   128257      132747
Direct inode steals                181          56
Kswapd inode steals                 59        1131

It basically shows that kswapd was active at roughly the same rate in
both kernels. There was also comparable slab scanning activity and direct
reclaim was avoided in both cases. There appears to be a large difference
in numbers of inodes reclaimed but the workload has few active inodes and
is likely a timing artifact.

stutter
-------

stutter simulates a simple workload. One part uses a lot of anonymous
memory, a second measures mmap latency and a third copies a large file.
The primary metric is checking for mmap latency.

stutter
                             4.7.0-rc4             4.7.0-rc4
                        mmotm-20160623            nodelru-v8
Min         mmap     16.6283 (  0.00%)     13.4258 ( 19.26%)
1st-qrtle   mmap     54.7570 (  0.00%)     34.9121 ( 36.24%)
2nd-qrtle   mmap     57.3163 (  0.00%)     46.1147 ( 19.54%)
3rd-qrtle   mmap     58.9976 (  0.00%)     47.1882 ( 20.02%)
Max-90%     mmap     59.7433 (  0.00%)     47.4453 ( 20.58%)
Max-93%     mmap     60.1298 (  0.00%)     47.6037 ( 20.83%)
Max-95%     mmap     73.4112 (  0.00%)     82.8719 (-12.89%)
Max-99%     mmap     92.8542 (  0.00%)     88.8870 (  4.27%)
Max         mmap   1440.6569 (  0.00%)    121.4201 ( 91.57%)
Mean        mmap     59.3493 (  0.00%)     42.2991 ( 28.73%)
Best99%Mean mmap     57.2121 (  0.00%)     41.8207 ( 26.90%)
Best95%Mean mmap     55.9113 (  0.00%)     39.9620 ( 28.53%)
Best90%Mean mmap     55.6199 (  0.00%)     39.3124 ( 29.32%)
Best50%Mean mmap     53.2183 (  0.00%)     33.1307 ( 37.75%)
Best10%Mean mmap     45.9842 (  0.00%)     20.4040 ( 55.63%)
Best5%Mean  mmap     43.2256 (  0.00%)     17.9654 ( 58.44%)
Best1%Mean  mmap     32.9388 (  0.00%)     16.6875 ( 49.34%)

This shows a number of improvements with the worst-case outlier greatly
improved.

Some of the vmstats are interesting

                             4.7.0-rc4   4.7.0-rc4
                          mmotm-20160623nodelru-v8
Swap Ins                           163         502
Swap Outs                            0           0
DMA allocs                           0           0
DMA32 allocs                 618719206  1381662383
Normal allocs                891235743   564138421
Movable allocs                       0           0
Allocation stalls                 2603           1
Direct pages scanned            216787           2
Kswapd pages scanned          50719775    41778378
Kswapd pages reclaimed        41541765    41777639
Direct pages reclaimed          209159           0
Kswapd efficiency                  81%         99%
Kswapd velocity              16859.554   14329.059
Direct efficiency                  96%          0%
Direct velocity                 72.061       0.001
Percentage direct scans             0%          0%
Page writes by reclaim         6215049           0
Page writes file               6215049           0
Page writes anon                     0           0
Page reclaim immediate           70673          90
Sector Reads                  81940800    81680456
Sector Writes                100158984    98816036
Page rescued immediate               0           0
Slabs scanned                  1366954       22683

While this is not guaranteed in all cases, this particular test showed
a large reduction in direct reclaim activity. It's also worth noting
that no page writes were issued from reclaim context.

This series is not without its hazards. There are at least three areas
that I'm concerned with even though I could not reproduce any problems in
that area.

1. Reclaim/compaction is going to be affected because the amount of reclaim is
   no longer targetted at a specific zone. Compaction works on a per-zone basis
   so there is no guarantee that reclaiming a few THP's worth page pages will
   have a positive impact on compaction success rates.

2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers
   are called is now different. This may or may not be a problem but if it
   is, it'll be because shrinkers are not called enough and some balancing
   is required.

3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are
   distributed between zones and the fair zone allocation policy used to do
   something very similar for anon. The distribution is now different but not
   necessarily in any way that matters but it's still worth bearing in mind.

VM statistic counters for reclaim decisions are zone-based.  If the kernel
is to reclaim on a per-node basis then we need to track per-node
statistics but there is no infrastructure for that.  The most notable
change is that the old node_page_state is renamed to
sum_zone_node_page_state.  The new node_page_state takes a pglist_data and
uses per-node stats but none exist yet.  There is some renaming such as
vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming
of mod_state to mod_zone_state.  Otherwise, this is mostly a mechanical
patch with no functional change.  There is a lot of similarity between the
node and zone helpers which is unfortunate but there was no obvious way of
reusing the code and maintaining type safety.

Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Johannes Weiner
55779ec759 mm: fix vm-scalability regression in cgroup-aware workingset code
Commit 23047a96d7 ("mm: workingset: per-cgroup cache thrash
detection") added a page->mem_cgroup lookup to the cache eviction,
refault, and activation paths, as well as locking to the activation
path, and the vm-scalability tests showed a regression of -23%.

While the test in question is an artificial worst-case scenario that
doesn't occur in real workloads - reading two sparse files in parallel
at full CPU speed just to hammer the LRU paths - there is still some
optimizations that can be done in those paths.

Inline the lookup functions to eliminate calls.  Also, page->mem_cgroup
doesn't need to be stabilized when counting an activation; we merely
need to hold the RCU lock to prevent the memcg from being freed.

This cuts down on overhead quite a bit:

23047a96d7 063f6715e77a7be5770d6081fe
---------------- --------------------------
         %stddev     %change         %stddev
             \          |                \
  21621405 +- 0%     +11.3%   24069657 +- 2%  vm-scalability.throughput

[linux@roeck-us.net: drop unnecessary include file]
[hannes@cmpxchg.org: add WARN_ON_ONCE()s]
  Link: http://lkml.kernel.org/r/20160707194024.GA26580@cmpxchg.org
Link: http://lkml.kernel.org/r/20160624175101.GA3024@cmpxchg.org
Reported-by: Ye Xiaolong <xiaolong.ye@intel.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00