d4414bc0e93d8da170fd0fc9fef65fe84015677d
277 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d3e755f0bd |
mm: introduce deactivate_page
perprocess reclaims needs to deactivate file pages from active LRU when echo file > /proc/<pid>/reclaim. Add deactivate_file pages. Bug: 131016077 Bug: 153444106 (cherry picked from b07eab27085611203ad359b7f4eecd138d7d771a) Change-Id: I06fed20103671e4ca6fb8663d5029736442162a5 Signed-off-by: Minchan Kim <minchan@google.com> Signed-off-by: Martin Liu <liumartin@google.com> |
||
|
|
9a12974ec9 |
Revert "mm: introduce __lru_cache_add_active_or_unevictable"
This reverts commit
|
||
|
|
ef36c60fe7 |
mm: Revert previous mm revert list
This commit reverts e799c1b10c54...cfb042c6c5d1 Reason for revert: unblock GKI Bug: 140544941 Test: boot Change-Id: I4ebe6c01918788cdc2468ceabf101ef7c3e3c452 Signed-off-by: Martin Liu <liumartin@google.com> |
||
|
|
9661924262 |
Revert "mm: introduce __lru_cache_add_active_or_unevictable"
This reverts commit
|
||
|
|
cbbfa7615c |
Merge android-4.19-q.88 (47d86d5) into msm-4.19
* refs/heads/tmp-47d86d5: Linux 4.19.88 net: fec: fix clock count mis-match platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer dmaengine: stm32-dma: check whether length is aligned on FIFO threshold ASoC: stm32: sai: add missing put_device() ASoC: stm32: i2s: fix IRQ clearing ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix dma configuration pinctrl: stm32: fix memory leak issue mailbox: mailbox-test: fix null pointer if no mmio clk: stm32mp1: parent clocks update clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks clk: stm32mp1: fix mcu divider table clk: stm32mp1: fix HSI divider flag hwrng: stm32 - fix unbalanced pm_runtime_enable media: stm32-dcmi: fix check of pm_runtime_get_sync return value media: stm32-dcmi: fix DMA corruption when stopping streaming crypto: stm32/hash - Fix hmac issue more than 256 bytes HID: core: check whether Usage Page item is after Usage ID items tcp: exit if nothing to retransmit on RTO timeout mailbox: stm32_ipcc: add spinlock to fix channels concurrent access drm/atmel-hlcdc: revert shift by 8 mtd: spi-nor: cast to u64 to avoid uint overflows mtd: rawnand: atmel: fix possible object reference leak mtd: rawnand: atmel: Fix spelling mistake in error message net: macb driver, check for SKBTX_HW_TSTAMP net: macb: Fix SUBNS increment and increase resolution watchdog: sama5d4: fix WDD value to be always set to max ext4: add more paranoia checking in ext4_expand_extra_isize handling net: macb: add missed tasklet_kill net: sched: fix `tc -s class show` no bstats on class with nolock subqueues sctp: cache netns in sctp_ep_common tipc: fix link name length check selftests: bpf: test_sockmap: handle file creation failures gracefully openvswitch: remove another BUG_ON() openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() slip: Fix use-after-free Read in slip_open sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook openvswitch: fix flow command message size net: psample: fix skb_over_panic macvlan: schedule bc_work even if error media: atmel: atmel-isc: fix INIT_WORK misplacement media: atmel: atmel-isc: fix asd memory allocation pwm: Clear chip_data in pwm_put() net: macb: fix error format in dev_err() media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE xfrm: Fix memleak on xfrm state destroy thunderbolt: Power cycle the router if NVM authentication fails mei: me: add comet point V device id mei: bus: prefix device names on bus with the bus name USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids staging: rtl8723bs: Drop ACPI device ids staging: rtl8192e: fix potential use after free usb: dwc2: use a longer core rest timeout in dwc2_core_reset() clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() clk: at91: fix update bit maps on CFG_MOR write mm, gup: add missing refcount overflow checks on s390 mtd: Remove a debug trace in mtdpart.c xdp: fix cpumap redirect SKB creation bug powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp RDMA/hns: Fix the state of rereg mr RDMA/hns: Bugfix for the scene without receiver queue RDMA/hns: Fix the bug with updating rq head pointer when flush cqe scsi: libsas: Check SMP PHY control function result scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned ACPI / APEI: Switch estatus pool to use vmalloc memory ACPI / APEI: Don't wait to serialise with oops messages when panic()ing scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery apparmor: delete the dentry in aafs_remove() to avoid a leak iommu/amd: Fix NULL dereference bug in match_hid_uid net: hns3: fix an issue for hns3_update_new_int_gl net: hns3: fix an issue for hclgevf_ae_get_hdev net: hns3: fix PFC not setting problem for DCB module net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED bpf: drop refcount if bpf_map_new_fd() fails in map_create() kvm: properly check debugfs dentry before using it net: dev: Use unsigned integer as an argument to left-shift mmc: core: align max segment size with logical block size bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() sctp: don't compare hb_timer expire date before starting it net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap net: ip_gre: do not report erspan_ver for gre or gretap net: fix possible overflow in __sk_mem_raise_allocated() geneve: change NET_UDP_TUNNEL dependency to select sfc: initialise found bitmap in efx_ef10_mtd_probe ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI tipc: fix skb may be leaky in tipc_link_input net/smc: fix byte_order for rx_curs_confirmed blktrace: Show requests without sector net/smc: fix sender_free computation xfs: end sync buffer I/O properly on shutdown error mm/hotplug: invalid PFNs from pfn_to_online_page() net/smc: don't wait for send buffer space when data was already sent net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() decnet: fix DN_IFREQ_SIZE ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change serial: 8250: Fix serial8250 initialization crash net/core/neighbour: fix kmemleak minimal reference count for hash tables PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs net/core/neighbour: tell kmemleak about hash tables tipc: fix memory leak in tipc_nl_compat_publ_dump mtd: Check add_mtd_device() ret code lib/genalloc.c: include vmalloc.h drivers/base/platform.c: kmemleak ignore a known leak fork: fix some -Wmissing-prototypes warnings lib/genalloc.c: use vzalloc_node() to allocate the bitmap lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk firmware: arm_sdei: Fix DT platform device creation firmware: arm_sdei: fix wrong of_node_put() in init function infiniband/qedr: Potential null ptr dereference of qp infiniband: bnxt_re: qplib: Check the return value of send_message xprtrdma: Prevent leak of rpcrdma_rep objects netfilter: nf_tables: fix a missing check of nla_put_failure tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() mm/page_alloc.c: use a single function to free page mm/page_alloc.c: free order-0 pages through PCP in page_frag_free() vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n ocfs2: clear journal dirty flag after shutdown journal net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() net: marvell: fix a missing check of acpi_match_device tipc: fix a missing check of genlmsg_put atl1e: checking the status of atl1e_write_phy_reg net: dsa: bcm_sf2: Propagate error value from mdio_write net: stmicro: fix a missing check of clk_prepare net: (cpts) fix a missing check of clk_prepare um: Make GCOV depend on !KCOV um: Include sys/uio.h to have writev() f2fs: fix to dirty inode synchronously f2fs: fix block address for __check_sit_bitmap net/net_namespace: Check the return value of register_pernet_subsys() net/netlink_compat: Fix a missing check of nla_parse_nested pwm: clps711x: Fix period calculation crypto: mxc-scc - fix build warnings on ARM64 powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc/83xx: handle machine check caused by watchdog timer regulator: tps65910: fix a missing check of return value bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't IB/rxe: Make counters thread safe drbd: fix print_st_err()'s prototype to match the definition drbd: do not block when adjusting "disk-options" while IO is frozen drbd: reject attach of unsuitable uuids even if connected drbd: ignore "all zero" peer volume sizes in handshake powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status vfio/spapr_tce: Get rid of possible infinite loop powerpc/44x/bamboo: Fix PCI range powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/prom: fix early DEBUG messages powerpc/32: Avoid unsupported flags with clang powerpc/perf: Fix unit_sel/cache_sel checks ath6kl: Fix off by one error in scan completion ath6kl: Only use match sets when firmware supports it brcmfmac: Fix access point mode scsi: csiostor: fix incorrect dma device in case of vport scsi: qla2xxx: deadlock by configfs_depend_item RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer openrisc: Fix broken paths to arch/or32 serial: max310x: Fix tx_empty() callback Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading drivers/regulator: fix a missing check of return value powerpc/xmon: fix dump_segments() powerpc/book3s/32: fix number of bats in p/v_block_mapped() vxlan: Fix error path in __vxlan_dev_create() clocksource/drivers/fttmr010: Fix invalid interrupt register access IB/qib: Fix an error code in qib_sdma_verbs_send() xfs: Fix bulkstat compat ioctls on x32 userspace. xfs: Align compat attrlist_by_handle with native implementation. dm raid: fix false -EBUSY when handling check/repair message gfs2: take jdata unstuff into account in do_grow dm flakey: Properly corrupt multi-page bios. HID: doc: fix wrong data structure reference for UHID_OUTPUT pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width KVM: s390: unregister debug feature on failing arch init bnxt_en: query force speeds before disabling autoneg mode. bnxt_en: Save ring statistics before reset. bnxt_en: Return linux standard errors in bnxt_ethtool.c exofs_mount(): fix leaks on failure exits netfilter: nf_nat_sip: fix RTP/RTCP source port translations net/mlx5: Continue driver initialization despite debugfs failure pinctrl: xway: fix gpio-hog related boot issues memory: omap-gpmc: Get the header of the enum vfio-mdev/samples: Use u8 instead of char for handle functions kprobes/x86: Show x86-64 specific blacklisted symbols correctly kprobes: Blacklist symbols in arch-defined prohibited area xen/pciback: Check dev_data before using it kprobes/x86/xen: blacklist non-attachable xen interrupt functions serial: 8250: Rate limit serial port rx interrupts during input overruns gpio: raspberrypi-exp: decrease refcount on firmware dt node HID: intel-ish-hid: fixes incorrect error handling serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback btrfs: only track ref_heads in delayed_ref_updates Btrfs: allow clear_extent_dirty() to receive a cached extent state record btrfs: dev-replace: set result code of cancel by status of scrub btrfs: fix ncopies raid_attr for RAID56 btrfs: Check for missing device before bio submission in btrfs_map_bio usb: ehci-omap: Fix deferred probe for phy handling mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET mmc: meson-gx: make sure the descriptor is stopped on errors VSOCK: bind to random port for VMADDR_PORT_ANY crypto/chelsio/chtls: listen fails with multiadapt Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" kvm: vmx: Set IA32_TSC_AUX for legacy mode guests gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB gpio: pca953x: Fix AI overflow on PCAL6524 iwlwifi: pcie: set cmd_len in the correct place iwlwifi: pcie: fix erroneous print iwlwifi: mvm: force TCM re-evaluation on TCM resume iwlwifi: move iwl_nvm_check_version() into dvm microblaze: fix multiple bugs in arch/microblaze/boot/Makefile microblaze: move "... is ready" messages to arch/microblaze/Makefile microblaze: adjust the help to the real behavior ubi: Do not drop UBI device reference before using ubi: Put MTD device after it is not used ubifs: Fix default compression selection in ubifs nvme: fix kernel paging oops xfs: require both realtime inodes to mount bcache: do not mark writeback_running too early bcache: do not check if debug dentry is ERR or NULL explicitly on remove rtl818x: fix potential use after free brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 brcmfmac: set F2 watermark to 256 for 4373 mwifiex: debugfs: correct histogram spacing, formatting mwifiex: fix potential NULL dereference and use after free arm64: dts: renesas: draak: Fix CVBS input crypto: user - support incremental algorithm dumps s390/zcrypt: make sysfs reset attribute trigger queue reset nvme: provide fallback for discard alloc failure scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port scsi: qla2xxx: Fix NPIV handling for FC-NVMe scsi: lpfc: Enable Management features for IF_TYPE=6 ACPI / LPSS: Ignore acpi_device_fix_up_power() return value ARM: ks8695: fix section mismatch warning xfs: zero length symlinks are not valid PM / AVS: SmartReflex: NULL check before some freeing functions is not needed RDMA/vmw_pvrdma: Use atomic memory allocation in create AH arm64: preempt: Fix big-endian when checking preempt count in assembly RDMA/hns: Fix the bug while use multi-hop of pbl ARM: OMAP1: fix USB configuration for device-only setups platform/x86: mlx-platform: Fix LED configuration bus: ti-sysc: Check for no-reset and no-idle flags at the child level arm64: smp: Handle errors reported by the firmware arm64: mm: Prevent mismatched 52-bit VA support ARM: dts: Fix hsi gdd range for omap4 parisc: Fix HP SDC hpa address output parisc: Fix serio address output ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication ARM: dts: imx25: Fix memory node duplication ARM: dts: imx27: Fix memory node duplication ARM: dts: imx1: Fix memory node duplication ARM: dts: imx23: Fix memory node duplication ARM: dts: imx50: Fix memory node duplication ARM: dts: imx6sl: Fix memory node duplication ARM: dts: imx6sx: Fix memory node duplication ARM: dts: imx6ul: Fix memory node duplication ARM: dts: imx7: Fix memory node duplication ARM: dts: imx35: Fix memory node duplication ARM: dts: imx31: Fix memory node duplication ARM: dts: imx53: Fix memory node duplication ARM: dts: imx51: Fix memory node duplication ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed tracing: Lock event_mutex before synth_event_mutex ARM: dts: Fix up SQ201 flash access scsi: lpfc: Fix dif and first burst use in write commands scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: target/tcmu: Fix queue_cmd_ring() declaration pwm: bcm-iproc: Prevent unloading the driver module while in use block: drbd: remove a stray unlock in __drbd_send_protocol() mac80211: fix station inactive_time shortly after boot net/fq_impl: Switch to kvmalloc() for memory allocation ceph: return -EINVAL if given fsc mount option on kernel w/o support net: mscc: ocelot: fix __ocelot_rmw_ix prototype net: bcmgenet: reapply manual settings to the PHY net: bcmgenet: use RGMII loopback for MAC reset scripts/gdb: fix debugging modules compiled with hot/cold partitioning ASoC: stm32: sai: add restriction on mmap support watchdog: meson: Fix the wrong value of left time can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: peak_usb: report bus recovery as well bridge: ebtables: don't crash when using dnat target in output chains net: fec: add missed clk_disable_unprepare in remove clk: ti: clkctrl: Fix failed to enable error with double udelay timeout clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call x86/resctrl: Prevent NULL pointer dereference when reading mondata idr: Fix idr_alloc_u32 on 32-bit systems idr: Fix integer overflow in idr_for_each_entry powerpc/bpf: Fix tail call implementation samples/bpf: fix build by setting HAVE_ATTR_TEST to zero ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup clk: at91: avoid sleeping early reset: fix reset_control_ops kerneldoc comment ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts pinctrl: cherryview: Allocate IRQ chip dynamic clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume ASoC: kirkwood: fix device remove ordering ASoC: kirkwood: fix external clock probe defer clk: samsung: exynos5433: Fix error paths reset: Fix memory leak in reset_control_array_put() ASoC: compress: fix unsigned integer overflow check ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX clocksource/drivers/mediatek: Fix error handling clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate Conflicts: drivers/mmc/core/queue.c Change-Id: I17f5829400ece4e39be4c9f6223fff206c710d06 Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org> |
||
|
|
47d86d5c55 |
Merge 4.19.88 into android-4.19-q
Changes in 4.19.88 clk: meson: gxbb: let sar_adc_clk_div set the parent clock rate clocksource/drivers/mediatek: Fix error handling ASoC: msm8916-wcd-analog: Fix RX1 selection in RDAC2 MUX ASoC: compress: fix unsigned integer overflow check reset: Fix memory leak in reset_control_array_put() clk: samsung: exynos5433: Fix error paths ASoC: kirkwood: fix external clock probe defer ASoC: kirkwood: fix device remove ordering clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume pinctrl: cherryview: Allocate IRQ chip dynamic ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts reset: fix reset_control_ops kerneldoc comment clk: at91: avoid sleeping early clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend samples/bpf: fix build by setting HAVE_ATTR_TEST to zero powerpc/bpf: Fix tail call implementation idr: Fix integer overflow in idr_for_each_entry idr: Fix idr_alloc_u32 on 32-bit systems x86/resctrl: Prevent NULL pointer dereference when reading mondata clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call clk: ti: clkctrl: Fix failed to enable error with double udelay timeout net: fec: add missed clk_disable_unprepare in remove bridge: ebtables: don't crash when using dnat target in output chains can: peak_usb: report bus recovery as well can: c_can: D_CAN: c_can_chip_config(): perform a sofware reset on open can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak can: rx-offload: can_rx_offload_offload_one(): do not increase the skb_queue beyond skb_queue_len_max can: rx-offload: can_rx_offload_offload_one(): increment rx_fifo_errors on queue overflow or OOM can: rx-offload: can_rx_offload_offload_one(): use ERR_PTR() to propagate error value in case of errors can: rx-offload: can_rx_offload_irq_offload_timestamp(): continue on error can: rx-offload: can_rx_offload_irq_offload_fifo(): continue on error can: flexcan: increase error counters if skb enqueueing via can_rx_offload_queue_sorted() fails can: mcp251x: mcp251x_restart_work_handler(): Fix potential force_quit race condition watchdog: meson: Fix the wrong value of left time ASoC: stm32: sai: add restriction on mmap support scripts/gdb: fix debugging modules compiled with hot/cold partitioning net: bcmgenet: use RGMII loopback for MAC reset net: bcmgenet: reapply manual settings to the PHY net: mscc: ocelot: fix __ocelot_rmw_ix prototype ceph: return -EINVAL if given fsc mount option on kernel w/o support net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix station inactive_time shortly after boot block: drbd: remove a stray unlock in __drbd_send_protocol() pwm: bcm-iproc: Prevent unloading the driver module while in use scsi: target/tcmu: Fix queue_cmd_ring() declaration scsi: lpfc: Fix kernel Oops due to null pring pointers scsi: lpfc: Fix dif and first burst use in write commands ARM: dts: Fix up SQ201 flash access tracing: Lock event_mutex before synth_event_mutex ARM: debug-imx: only define DEBUG_IMX_UART_PORT if needed ARM: dts: imx51: Fix memory node duplication ARM: dts: imx53: Fix memory node duplication ARM: dts: imx31: Fix memory node duplication ARM: dts: imx35: Fix memory node duplication ARM: dts: imx7: Fix memory node duplication ARM: dts: imx6ul: Fix memory node duplication ARM: dts: imx6sx: Fix memory node duplication ARM: dts: imx6sl: Fix memory node duplication ARM: dts: imx50: Fix memory node duplication ARM: dts: imx23: Fix memory node duplication ARM: dts: imx1: Fix memory node duplication ARM: dts: imx27: Fix memory node duplication ARM: dts: imx25: Fix memory node duplication ARM: dts: imx53-voipac-dmm-668: Fix memory node duplication parisc: Fix serio address output parisc: Fix HP SDC hpa address output ARM: dts: Fix hsi gdd range for omap4 arm64: mm: Prevent mismatched 52-bit VA support arm64: smp: Handle errors reported by the firmware bus: ti-sysc: Check for no-reset and no-idle flags at the child level platform/x86: mlx-platform: Fix LED configuration ARM: OMAP1: fix USB configuration for device-only setups RDMA/hns: Fix the bug while use multi-hop of pbl arm64: preempt: Fix big-endian when checking preempt count in assembly RDMA/vmw_pvrdma: Use atomic memory allocation in create AH PM / AVS: SmartReflex: NULL check before some freeing functions is not needed xfs: zero length symlinks are not valid ARM: ks8695: fix section mismatch warning ACPI / LPSS: Ignore acpi_device_fix_up_power() return value scsi: lpfc: Enable Management features for IF_TYPE=6 scsi: qla2xxx: Fix NPIV handling for FC-NVMe scsi: qla2xxx: Fix for FC-NVMe discovery for NPIV port nvme: provide fallback for discard alloc failure s390/zcrypt: make sysfs reset attribute trigger queue reset crypto: user - support incremental algorithm dumps arm64: dts: renesas: draak: Fix CVBS input mwifiex: fix potential NULL dereference and use after free mwifiex: debugfs: correct histogram spacing, formatting brcmfmac: set F2 watermark to 256 for 4373 brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 rtl818x: fix potential use after free bcache: do not check if debug dentry is ERR or NULL explicitly on remove bcache: do not mark writeback_running too early xfs: require both realtime inodes to mount nvme: fix kernel paging oops ubifs: Fix default compression selection in ubifs ubi: Put MTD device after it is not used ubi: Do not drop UBI device reference before using microblaze: adjust the help to the real behavior microblaze: move "... is ready" messages to arch/microblaze/Makefile microblaze: fix multiple bugs in arch/microblaze/boot/Makefile iwlwifi: move iwl_nvm_check_version() into dvm iwlwifi: mvm: force TCM re-evaluation on TCM resume iwlwifi: pcie: fix erroneous print iwlwifi: pcie: set cmd_len in the correct place gpio: pca953x: Fix AI overflow on PCAL6524 gpiolib: Fix return value of gpio_to_desc() stub if !GPIOLIB kvm: vmx: Set IA32_TSC_AUX for legacy mode guests Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" crypto/chelsio/chtls: listen fails with multiadapt VSOCK: bind to random port for VMADDR_PORT_ANY mmc: meson-gx: make sure the descriptor is stopped on errors mtd: rawnand: sunxi: Write pageprog related opcodes to WCMD_SET usb: ehci-omap: Fix deferred probe for phy handling btrfs: Check for missing device before bio submission in btrfs_map_bio btrfs: fix ncopies raid_attr for RAID56 btrfs: dev-replace: set result code of cancel by status of scrub Btrfs: allow clear_extent_dirty() to receive a cached extent state record btrfs: only track ref_heads in delayed_ref_updates serial: sh-sci: Fix crash in rx_timer_fn() on PIO fallback HID: intel-ish-hid: fixes incorrect error handling gpio: raspberrypi-exp: decrease refcount on firmware dt node serial: 8250: Rate limit serial port rx interrupts during input overruns kprobes/x86/xen: blacklist non-attachable xen interrupt functions xen/pciback: Check dev_data before using it kprobes: Blacklist symbols in arch-defined prohibited area kprobes/x86: Show x86-64 specific blacklisted symbols correctly vfio-mdev/samples: Use u8 instead of char for handle functions memory: omap-gpmc: Get the header of the enum pinctrl: xway: fix gpio-hog related boot issues net/mlx5: Continue driver initialization despite debugfs failure netfilter: nf_nat_sip: fix RTP/RTCP source port translations exofs_mount(): fix leaks on failure exits bnxt_en: Return linux standard errors in bnxt_ethtool.c bnxt_en: Save ring statistics before reset. bnxt_en: query force speeds before disabling autoneg mode. KVM: s390: unregister debug feature on failing arch init pinctrl: sh-pfc: r8a77990: Fix MOD_SEL0 SEL_I2C1 field width pinctrl: sh-pfc: sh7264: Fix PFCR3 and PFCR0 register configuration pinctrl: sh-pfc: sh7734: Fix shifted values in IPSR10 HID: doc: fix wrong data structure reference for UHID_OUTPUT dm flakey: Properly corrupt multi-page bios. gfs2: take jdata unstuff into account in do_grow dm raid: fix false -EBUSY when handling check/repair message xfs: Align compat attrlist_by_handle with native implementation. xfs: Fix bulkstat compat ioctls on x32 userspace. IB/qib: Fix an error code in qib_sdma_verbs_send() clocksource/drivers/fttmr010: Fix invalid interrupt register access vxlan: Fix error path in __vxlan_dev_create() powerpc/book3s/32: fix number of bats in p/v_block_mapped() powerpc/xmon: fix dump_segments() drivers/regulator: fix a missing check of return value Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading serial: max310x: Fix tx_empty() callback openrisc: Fix broken paths to arch/or32 RDMA/srp: Propagate ib_post_send() failures to the SCSI mid-layer scsi: qla2xxx: deadlock by configfs_depend_item scsi: csiostor: fix incorrect dma device in case of vport brcmfmac: Fix access point mode ath6kl: Only use match sets when firmware supports it ath6kl: Fix off by one error in scan completion powerpc/perf: Fix unit_sel/cache_sel checks powerpc/32: Avoid unsupported flags with clang powerpc/prom: fix early DEBUG messages powerpc/mm: Make NULL pointer deferences explicit on bad page faults. powerpc/44x/bamboo: Fix PCI range vfio/spapr_tce: Get rid of possible infinite loop powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status drbd: ignore "all zero" peer volume sizes in handshake drbd: reject attach of unsuitable uuids even if connected drbd: do not block when adjusting "disk-options" while IO is frozen drbd: fix print_st_err()'s prototype to match the definition IB/rxe: Make counters thread safe bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't regulator: tps65910: fix a missing check of return value powerpc/83xx: handle machine check caused by watchdog timer powerpc/pseries: Fix node leak in update_lmb_associativity_index() powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y crypto: mxc-scc - fix build warnings on ARM64 pwm: clps711x: Fix period calculation net/netlink_compat: Fix a missing check of nla_parse_nested net/net_namespace: Check the return value of register_pernet_subsys() f2fs: fix block address for __check_sit_bitmap f2fs: fix to dirty inode synchronously um: Include sys/uio.h to have writev() um: Make GCOV depend on !KCOV net: (cpts) fix a missing check of clk_prepare net: stmicro: fix a missing check of clk_prepare net: dsa: bcm_sf2: Propagate error value from mdio_write atl1e: checking the status of atl1e_write_phy_reg tipc: fix a missing check of genlmsg_put net: marvell: fix a missing check of acpi_match_device net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() ocfs2: clear journal dirty flag after shutdown journal vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n mm/page_alloc.c: free order-0 pages through PCP in page_frag_free() mm/page_alloc.c: use a single function to free page mm/page_alloc.c: deduplicate __memblock_free_early() and memblock_free() tools/vm/page-types.c: fix "kpagecount returned fewer pages than expected" failures netfilter: nf_tables: fix a missing check of nla_put_failure xprtrdma: Prevent leak of rpcrdma_rep objects infiniband: bnxt_re: qplib: Check the return value of send_message infiniband/qedr: Potential null ptr dereference of qp firmware: arm_sdei: fix wrong of_node_put() in init function firmware: arm_sdei: Fix DT platform device creation lib/genalloc.c: fix allocation of aligned buffer from non-aligned chunk lib/genalloc.c: use vzalloc_node() to allocate the bitmap fork: fix some -Wmissing-prototypes warnings drivers/base/platform.c: kmemleak ignore a known leak lib/genalloc.c: include vmalloc.h mtd: Check add_mtd_device() ret code tipc: fix memory leak in tipc_nl_compat_publ_dump net/core/neighbour: tell kmemleak about hash tables ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs PCI/MSI: Return -ENOSPC from pci_alloc_irq_vectors_affinity() net/core/neighbour: fix kmemleak minimal reference count for hash tables serial: 8250: Fix serial8250 initialization crash gpu: ipu-v3: pre: don't trigger update if buffer address doesn't change sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel decnet: fix DN_IFREQ_SIZE net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() net/smc: don't wait for send buffer space when data was already sent mm/hotplug: invalid PFNs from pfn_to_online_page() xfs: end sync buffer I/O properly on shutdown error net/smc: fix sender_free computation blktrace: Show requests without sector net/smc: fix byte_order for rx_curs_confirmed tipc: fix skb may be leaky in tipc_link_input ASoC: samsung: i2s: Fix prescaler setting for the secondary DAI sfc: initialise found bitmap in efx_ef10_mtd_probe geneve: change NET_UDP_TUNNEL dependency to select net: fix possible overflow in __sk_mem_raise_allocated() net: ip_gre: do not report erspan_ver for gre or gretap net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap sctp: don't compare hb_timer expire date before starting it bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() mmc: core: align max segment size with logical block size net: dev: Use unsigned integer as an argument to left-shift kvm: properly check debugfs dentry before using it bpf: drop refcount if bpf_map_new_fd() fails in map_create() net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED net: hns3: fix PFC not setting problem for DCB module net: hns3: fix an issue for hclgevf_ae_get_hdev net: hns3: fix an issue for hns3_update_new_int_gl iommu/amd: Fix NULL dereference bug in match_hid_uid apparmor: delete the dentry in aafs_remove() to avoid a leak scsi: libsas: Support SATA PHY connection rate unmatch fixing during discovery ACPI / APEI: Don't wait to serialise with oops messages when panic()ing ACPI / APEI: Switch estatus pool to use vmalloc memory scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned scsi: libsas: Check SMP PHY control function result RDMA/hns: Fix the bug with updating rq head pointer when flush cqe RDMA/hns: Bugfix for the scene without receiver queue RDMA/hns: Fix the state of rereg mr RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp ASoC: rt5645: Headphone Jack sense inverts on the LattePanda board powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property() xdp: fix cpumap redirect SKB creation bug mtd: Remove a debug trace in mtdpart.c mm, gup: add missing refcount overflow checks on s390 clk: at91: fix update bit maps on CFG_MOR write clk: at91: generated: set audio_pll_allowed in at91_clk_register_generated() usb: dwc2: use a longer core rest timeout in dwc2_core_reset() staging: rtl8192e: fix potential use after free staging: rtl8723bs: Drop ACPI device ids staging: rtl8723bs: Add 024c:0525 to the list of SDIO device-ids USB: serial: ftdi_sio: add device IDs for U-Blox C099-F9P mei: bus: prefix device names on bus with the bus name mei: me: add comet point V device id thunderbolt: Power cycle the router if NVM authentication fails xfrm: Fix memleak on xfrm state destroy media: v4l2-ctrl: fix flags for DO_WHITE_BALANCE net: macb: fix error format in dev_err() pwm: Clear chip_data in pwm_put() media: atmel: atmel-isc: fix asd memory allocation media: atmel: atmel-isc: fix INIT_WORK misplacement macvlan: schedule bc_work even if error net: psample: fix skb_over_panic openvswitch: fix flow command message size sctp: Fix memory leak in sctp_sf_do_5_2_4_dupcook slip: Fix use-after-free Read in slip_open openvswitch: drop unneeded BUG_ON() in ovs_flow_cmd_build_info() openvswitch: remove another BUG_ON() selftests: bpf: test_sockmap: handle file creation failures gracefully tipc: fix link name length check sctp: cache netns in sctp_ep_common net: sched: fix `tc -s class show` no bstats on class with nolock subqueues net: macb: add missed tasklet_kill ext4: add more paranoia checking in ext4_expand_extra_isize handling watchdog: sama5d4: fix WDD value to be always set to max net: macb: Fix SUBNS increment and increase resolution net: macb driver, check for SKBTX_HW_TSTAMP mtd: rawnand: atmel: Fix spelling mistake in error message mtd: rawnand: atmel: fix possible object reference leak mtd: spi-nor: cast to u64 to avoid uint overflows drm/atmel-hlcdc: revert shift by 8 mailbox: stm32_ipcc: add spinlock to fix channels concurrent access tcp: exit if nothing to retransmit on RTO timeout HID: core: check whether Usage Page item is after Usage ID items crypto: stm32/hash - Fix hmac issue more than 256 bytes media: stm32-dcmi: fix DMA corruption when stopping streaming media: stm32-dcmi: fix check of pm_runtime_get_sync return value hwrng: stm32 - fix unbalanced pm_runtime_enable clk: stm32mp1: fix HSI divider flag clk: stm32mp1: fix mcu divider table clk: stm32mp1: add CLK_SET_RATE_NO_REPARENT to Kernel clocks clk: stm32mp1: parent clocks update mailbox: mailbox-test: fix null pointer if no mmio pinctrl: stm32: fix memory leak issue ASoC: stm32: i2s: fix dma configuration ASoC: stm32: i2s: fix 16 bit format support ASoC: stm32: i2s: fix IRQ clearing ASoC: stm32: sai: add missing put_device() dmaengine: stm32-dma: check whether length is aligned on FIFO threshold platform/x86: hp-wmi: Fix ACPI errors caused by too small buffer platform/x86: hp-wmi: Fix ACPI errors caused by passing 0 as input size net: fec: fix clock count mis-match Linux 4.19.88 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib17de804b245aa3da465ddc97270b8c7bd9fba66 |
||
|
|
78ce155fb8 |
vmscan: return NODE_RECLAIM_NOSCAN in node_reclaim() when CONFIG_NUMA is n
[ Upstream commit 8b09549c2bfd9f3f8f4cdad74107ef4f4ff9cdd7 ]
Commit
|
||
|
|
1b74ac0833 |
Merge android-4.19.36 (10f41ccfc) into msm-4.19
* refs/heads/tmp-10f41ccfc: Linux 4.19.36 appletalk: Fix compile regression mm: hide incomplete nr_indirectly_reclaimable in sysfs mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo IB/hfi1: Failed to drain send queue when QP is put into error state bpf: fix use after free in bpf_evict_inode include/linux/swap.h: use offsetof() instead of custom __swapoffset macro f2fs: fix to dirty inode for i_mode recovery rxrpc: Fix client call connect/disconnect race lib/div64.c: off by one in shift appletalk: Fix use-after-free in atalk_proc_exit drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t drm/nouveau/volt/gf117: fix speedo readout register PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports coresight: cpu-debug: Support for CA73 CPUs Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" crypto: axis - fix for recursive locking from bottom half drm/panel: panel-innolux: set display off in innolux_panel_unprepare lkdtm: Add tests for NULL pointer dereference lkdtm: Print real addresses soc/tegra: pmc: Drop locking from tegra_powergate_is_powered() scsi: core: Avoid that system resume triggers a kernel warning iommu/dmar: Fix buffer overflow during PCI bus notification net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version crypto: sha512/arm - fix crash bug in Thumb2 build crypto: sha256/arm - fix crash bug in Thumb2 build xfrm: destroy xfrm_state synchronously on net exit path net/rds: fix warn in rds_message_alloc_sgs ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle ALSA: hda: fix front speakers on Huawei MBXP drm/ttm: Fix bo_global and mem_global kfree error platform/x86: Add Intel AtomISP2 dummy / power-management driver kernel: hung_task.c: disable on suspend cifs: fallback to older infolevels on findfirst queryinfo retry net: stmmac: Set OWN bit for jumbo frames f2fs: cleanup dirty pages if recover failed netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine compiler.h: update definition of unreachable() KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2 ACPI / SBS: Fix GPE storm on recent MacBookPro's usbip: fix vhci_hcd controller counting ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms pinctrl: core: make sure strcmp() doesn't get a null parameter HID: i2c-hid: override HID descriptors for certain devices Bluetooth: Fix debugfs NULL pointer dereference media: au0828: cannot kfree dev before usb disconnect powerpc/pseries: Remove prrn_work workqueue serial: uartps: console_setup() can't be placed to init section netfilter: xt_cgroup: shrink size of v2 path f2fs: fix to do sanity check with current segment number ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx() 9p locks: add mount option for lock retry interval 9p: do not trust pdu content for stat item size f2fs: fix to avoid NULL pointer dereference on se->discard_map rsi: improve kernel thread handling to fix kernel panic gpio: pxa: handle corner case of unprobed device drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up ext4: prohibit fstrim in norecovery mode x86/gart: Exclude GART aperture from kcore fix incorrect error code mapping for OBJECTID_NOT_FOUND x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error iommu/vt-d: Check capability before disabling protected memory drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors x86/hyperv: Prevent potential NULL pointer dereference x86/hpet: Prevent potential NULL pointer dereference irqchip/mbigen: Don't clear eventid when freeing an MSI irqchip/stm32: Don't clear rising/falling config registers at init drm/exynos/mixer: fix MIXER shadow registry synchronisation code blk-iolatency: #include "blk.h" PM / Domains: Avoid a potential deadlock ACPI / utils: Drop reference in test for device presence perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test() perf tests: Fix memory leak by expr__find_other() in test__expr() perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test perf evsel: Free evsel->counts in perf_evsel__exit() perf hist: Add missing map__put() in error case perf top: Fix error handling in cmd_top() perf build-id: Fix memory leak in print_sdt_events() perf config: Fix a memory leak in collect_config() perf config: Fix an error in the config template documentation perf list: Don't forget to drop the reference to the allocated thread_map tools/power turbostat: return the exit status of a command x86/mm: Don't leak kernel addresses sched/core: Fix buffer overflow in cgroup2 property cpu.max sched/cpufreq: Fix 32-bit math overflow scsi: iscsi: flush running unbind operations when removing a session thermal/intel_powerclamp: fix truncated kthread name thermal/int340x_thermal: fix mode setting thermal/int340x_thermal: Add additional UUIDs thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs thermal: samsung: Fix incorrect check after code merge thermal/intel_powerclamp: fix __percpu declaration of worker_data ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration mmc: davinci: remove extraneous __init annotation i40iw: Avoid panic when handling the inetdev event IB/mlx4: Fix race condition between catas error reset and aliasguid flows drm/udl: use drm_gem_object_put_unlocked. auxdisplay: hd44780: Fix memory leak on ->remove() ALSA: sb8: add a check for request_region ALSA: echoaudio: add a check for ioremap_nocache ext4: report real fs size after failed resize ext4: add missing brelse() in add_new_gdb_meta_bg() ext4: avoid panic during forced reboot perf/core: Restore mmap record type correctly inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM ARC: u-boot args: check that magic number is correct ANDROID: cuttlefish_defconfig: Enable L2TP/PPTP ANDROID: Makefile: Properly resolve 4.19.35 merge Make arm64 serial port config compatible with crosvm Conflicts: kernel/sched/cpufreq_schedutil.c Change-Id: I8f049eb34344f72bf2d202c5e360f448771c78f4 Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org> |
||
|
|
10f41ccfc7 |
Merge 4.19.36 into android-4.19
Changes in 4.19.36 ARC: u-boot args: check that magic number is correct arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() perf/core: Restore mmap record type correctly ext4: avoid panic during forced reboot ext4: add missing brelse() in add_new_gdb_meta_bg() ext4: report real fs size after failed resize ALSA: echoaudio: add a check for ioremap_nocache ALSA: sb8: add a check for request_region auxdisplay: hd44780: Fix memory leak on ->remove() drm/udl: use drm_gem_object_put_unlocked. IB/mlx4: Fix race condition between catas error reset and aliasguid flows i40iw: Avoid panic when handling the inetdev event mmc: davinci: remove extraneous __init annotation ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration thermal/intel_powerclamp: fix __percpu declaration of worker_data thermal: samsung: Fix incorrect check after code merge thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs thermal/int340x_thermal: Add additional UUIDs thermal/int340x_thermal: fix mode setting thermal/intel_powerclamp: fix truncated kthread name scsi: iscsi: flush running unbind operations when removing a session sched/cpufreq: Fix 32-bit math overflow sched/core: Fix buffer overflow in cgroup2 property cpu.max x86/mm: Don't leak kernel addresses tools/power turbostat: return the exit status of a command perf list: Don't forget to drop the reference to the allocated thread_map perf config: Fix an error in the config template documentation perf config: Fix a memory leak in collect_config() perf build-id: Fix memory leak in print_sdt_events() perf top: Fix error handling in cmd_top() perf hist: Add missing map__put() in error case perf evsel: Free evsel->counts in perf_evsel__exit() perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test perf tests: Fix memory leak by expr__find_other() in test__expr() perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test() ACPI / utils: Drop reference in test for device presence PM / Domains: Avoid a potential deadlock blk-iolatency: #include "blk.h" drm/exynos/mixer: fix MIXER shadow registry synchronisation code irqchip/stm32: Don't clear rising/falling config registers at init irqchip/mbigen: Don't clear eventid when freeing an MSI x86/hpet: Prevent potential NULL pointer dereference x86/hyperv: Prevent potential NULL pointer dereference x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure iommu/vt-d: Check capability before disabling protected memory x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error fix incorrect error code mapping for OBJECTID_NOT_FOUND x86/gart: Exclude GART aperture from kcore ext4: prohibit fstrim in norecovery mode drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up gpio: pxa: handle corner case of unprobed device rsi: improve kernel thread handling to fix kernel panic f2fs: fix to avoid NULL pointer dereference on se->discard_map 9p: do not trust pdu content for stat item size 9p locks: add mount option for lock retry interval ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx() f2fs: fix to do sanity check with current segment number netfilter: xt_cgroup: shrink size of v2 path serial: uartps: console_setup() can't be placed to init section powerpc/pseries: Remove prrn_work workqueue media: au0828: cannot kfree dev before usb disconnect Bluetooth: Fix debugfs NULL pointer dereference HID: i2c-hid: override HID descriptors for certain devices pinctrl: core: make sure strcmp() doesn't get a null parameter ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms usbip: fix vhci_hcd controller counting ACPI / SBS: Fix GPE storm on recent MacBookPro's HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2 KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail compiler.h: update definition of unreachable() netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine f2fs: cleanup dirty pages if recover failed net: stmmac: Set OWN bit for jumbo frames cifs: fallback to older infolevels on findfirst queryinfo retry kernel: hung_task.c: disable on suspend platform/x86: Add Intel AtomISP2 dummy / power-management driver drm/ttm: Fix bo_global and mem_global kfree error ALSA: hda: fix front speakers on Huawei MBXP ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle net/rds: fix warn in rds_message_alloc_sgs xfrm: destroy xfrm_state synchronously on net exit path crypto: sha256/arm - fix crash bug in Thumb2 build crypto: sha512/arm - fix crash bug in Thumb2 build net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version iommu/dmar: Fix buffer overflow during PCI bus notification scsi: core: Avoid that system resume triggers a kernel warning soc/tegra: pmc: Drop locking from tegra_powergate_is_powered() lkdtm: Print real addresses lkdtm: Add tests for NULL pointer dereference drm/panel: panel-innolux: set display off in innolux_panel_unprepare crypto: axis - fix for recursive locking from bottom half Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" coresight: cpu-debug: Support for CA73 CPUs PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports drm/nouveau/volt/gf117: fix speedo readout register ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) appletalk: Fix use-after-free in atalk_proc_exit lib/div64.c: off by one in shift rxrpc: Fix client call connect/disconnect race f2fs: fix to dirty inode for i_mode recovery include/linux/swap.h: use offsetof() instead of custom __swapoffset macro bpf: fix use after free in bpf_evict_inode IB/hfi1: Failed to drain send queue when QP is put into error state mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo mm: hide incomplete nr_indirectly_reclaimable in sysfs appletalk: Fix compile regression Linux 4.19.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
9154c0c7e2 |
Merge 4.19.36 into android-4.19-q
Changes in 4.19.36 ARC: u-boot args: check that magic number is correct arc: hsdk_defconfig: Enable CONFIG_BLK_DEV_RAM inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() perf/core: Restore mmap record type correctly ext4: avoid panic during forced reboot ext4: add missing brelse() in add_new_gdb_meta_bg() ext4: report real fs size after failed resize ALSA: echoaudio: add a check for ioremap_nocache ALSA: sb8: add a check for request_region auxdisplay: hd44780: Fix memory leak on ->remove() drm/udl: use drm_gem_object_put_unlocked. IB/mlx4: Fix race condition between catas error reset and aliasguid flows i40iw: Avoid panic when handling the inetdev event mmc: davinci: remove extraneous __init annotation ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration thermal/intel_powerclamp: fix __percpu declaration of worker_data thermal: samsung: Fix incorrect check after code merge thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs thermal/int340x_thermal: Add additional UUIDs thermal/int340x_thermal: fix mode setting thermal/intel_powerclamp: fix truncated kthread name scsi: iscsi: flush running unbind operations when removing a session sched/cpufreq: Fix 32-bit math overflow sched/core: Fix buffer overflow in cgroup2 property cpu.max x86/mm: Don't leak kernel addresses tools/power turbostat: return the exit status of a command perf list: Don't forget to drop the reference to the allocated thread_map perf config: Fix an error in the config template documentation perf config: Fix a memory leak in collect_config() perf build-id: Fix memory leak in print_sdt_events() perf top: Fix error handling in cmd_top() perf hist: Add missing map__put() in error case perf evsel: Free evsel->counts in perf_evsel__exit() perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test perf tests: Fix memory leak by expr__find_other() in test__expr() perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test() ACPI / utils: Drop reference in test for device presence PM / Domains: Avoid a potential deadlock blk-iolatency: #include "blk.h" drm/exynos/mixer: fix MIXER shadow registry synchronisation code irqchip/stm32: Don't clear rising/falling config registers at init irqchip/mbigen: Don't clear eventid when freeing an MSI x86/hpet: Prevent potential NULL pointer dereference x86/hyperv: Prevent potential NULL pointer dereference x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors drm/nouveau/debugfs: Fix check of pm_runtime_get_sync failure iommu/vt-d: Check capability before disabling protected memory x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error fix incorrect error code mapping for OBJECTID_NOT_FOUND x86/gart: Exclude GART aperture from kcore ext4: prohibit fstrim in norecovery mode drm/cirrus: Use drm_framebuffer_put to avoid kernel oops in clean-up gpio: pxa: handle corner case of unprobed device rsi: improve kernel thread handling to fix kernel panic f2fs: fix to avoid NULL pointer dereference on se->discard_map 9p: do not trust pdu content for stat item size 9p locks: add mount option for lock retry interval ASoC: Fix UBSAN warning at snd_soc_get/put_volsw_sx() f2fs: fix to do sanity check with current segment number netfilter: xt_cgroup: shrink size of v2 path serial: uartps: console_setup() can't be placed to init section powerpc/pseries: Remove prrn_work workqueue media: au0828: cannot kfree dev before usb disconnect Bluetooth: Fix debugfs NULL pointer dereference HID: i2c-hid: override HID descriptors for certain devices pinctrl: core: make sure strcmp() doesn't get a null parameter ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms usbip: fix vhci_hcd controller counting ACPI / SBS: Fix GPE storm on recent MacBookPro's HID: usbhid: Add quirk for Redragon/Dragonrise Seymur 2 KVM: nVMX: restore host state in nested_vmx_vmexit for VMFail compiler.h: update definition of unreachable() netfilter: nf_flow_table: remove flowtable hook flush routine in netns exit routine f2fs: cleanup dirty pages if recover failed net: stmmac: Set OWN bit for jumbo frames cifs: fallback to older infolevels on findfirst queryinfo retry kernel: hung_task.c: disable on suspend platform/x86: Add Intel AtomISP2 dummy / power-management driver drm/ttm: Fix bo_global and mem_global kfree error ALSA: hda: fix front speakers on Huawei MBXP ACPI: EC / PM: Disable non-wakeup GPEs for suspend-to-idle net/rds: fix warn in rds_message_alloc_sgs xfrm: destroy xfrm_state synchronously on net exit path crypto: sha256/arm - fix crash bug in Thumb2 build crypto: sha512/arm - fix crash bug in Thumb2 build net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version iommu/dmar: Fix buffer overflow during PCI bus notification scsi: core: Avoid that system resume triggers a kernel warning soc/tegra: pmc: Drop locking from tegra_powergate_is_powered() lkdtm: Print real addresses lkdtm: Add tests for NULL pointer dereference drm/panel: panel-innolux: set display off in innolux_panel_unprepare crypto: axis - fix for recursive locking from bottom half Revert "ACPI / EC: Remove old CLEAR_ON_RESUME quirk" coresight: cpu-debug: Support for CA73 CPUs PCI: Blacklist power management of Gigabyte X299 DESIGNARE EX PCIe ports drm/nouveau/volt/gf117: fix speedo readout register ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) appletalk: Fix use-after-free in atalk_proc_exit lib/div64.c: off by one in shift rxrpc: Fix client call connect/disconnect race f2fs: fix to dirty inode for i_mode recovery include/linux/swap.h: use offsetof() instead of custom __swapoffset macro bpf: fix use after free in bpf_evict_inode IB/hfi1: Failed to drain send queue when QP is put into error state mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo mm: hide incomplete nr_indirectly_reclaimable in sysfs appletalk: Fix compile regression Linux 4.19.36 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
40c6d718d7 |
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
[ Upstream commit a4046c06be50a4f01d435aa7fe57514818e6cc82 ] Use offsetof() to calculate offset of a field to take advantage of compiler built-in version when possible, and avoid UBSAN warning when compiling with Clang: UBSAN: Undefined behaviour in mm/swapfile.c:3010:38 member access within null pointer of type 'union swap_header' CPU: 6 PID: 1833 Comm: swapon Tainted: G S 4.19.23 #43 Call trace: dump_backtrace+0x0/0x194 show_stack+0x20/0x2c __dump_stack+0x20/0x28 dump_stack+0x70/0x94 ubsan_epilogue+0x14/0x44 ubsan_type_mismatch_common+0xf4/0xfc __ubsan_handle_type_mismatch_v1+0x34/0x54 __se_sys_swapon+0x654/0x1084 __arm64_sys_swapon+0x1c/0x24 el0_svc_common+0xa8/0x150 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x18 Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.org Signed-off-by: Pi-Hsun Shih <pihsun@chromium.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
808c47e1d5 |
mm: introduce __lru_cache_add_active_or_unevictable
The speculative page fault handler which is run without holding the mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags is not guaranteed to remain constant. Introducing __lru_cache_add_active_or_unevictable() which has the vma flags value parameter instead of the vma pointer. Change-Id: I68decbe0f80847403127c45c97565e47512532e9 Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Patch-mainline: linux-mm @ Tue, 17 Apr 2018 16:33:20 [vinmenon@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> [charante@codeaurora.org: trivial merge conflict fixes] Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> |
||
|
|
b4a56abdf7 |
UPSTREAM: mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 8508cf3ffad4defa202b303e5b6379efc4cd9054) Bug: 127712811 Test: lmkd in PSI mode Change-Id: I71df060dce5590a3c654f9a0e8e54deeb74b64c2 Signed-off-by: Suren Baghdasaryan <surenb@google.com> |
||
|
|
494c0dafc7 |
mm: workingset: tell cache transitions from workingset thrashing
Refaults happen during transitions between workingsets as well as in-place thrashing. Knowing the difference between the two has a range of applications, including measuring the impact of memory shortage on the system performance, as well as the ability to smarter balance pressure between the filesystem cache and the swap-backed workingset. During workingset transitions, inactive cache refaults and pushes out established active cache. When that active cache isn't stale, however, and also ends up refaulting, that's bonafide thrashing. Introduce a new page flag that tells on eviction whether the page has been active or not in its lifetime. This bit is then stored in the shadow entry, to classify refaults as transitioning or thrashing. How many page->flags does this leave us with on 32-bit? 20 bits are always page flags 21 if you have an MMU 23 with the zone bits for DMA, Normal, HighMem, Movable 29 with the sparsemem section bits 30 if PAE is enabled 31 with this patch. So on 32-bit PAE, that leaves 1 bit for distinguishing two NUMA nodes. If that's not enough, the system can switch to discontigmem and re-gain the 6 or 7 sparsemem section bits. Link: http://lkml.kernel.org/r/20180828172258.3185-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Drake <drake@endlessm.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Cc: Christopher Lameter <cl@linux.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <jweiner@fb.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Enderborg <peter.enderborg@sony.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I6134c51987d401584f513f8b5a7826962009997c Git-commit: 1899ad18c6072d689896badafb81267b0a1092a4 Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.19 Signed-off-by: Patrick Daly <pdaly@codeaurora.org> |
||
|
|
45991f3d2a |
Merge android-4.19.18 (26bf816) into msm-4.19
* refs/heads/tmp-26bf816:
Linux 4.19.18
ipmi: Don't initialize anything in the core until something uses it
ipmi:ssif: Fix handling of multi-part return messages
ipmi: Prevent use-after-free in deliver_response
ipmi: msghandler: Fix potential Spectre v1 vulnerabilities
ipmi: fix use-after-free of user->release_barrier.rda
Bluetooth: Fix unnecessary error message for HCI request completion
iwlwifi: mvm: Send LQ command as async when necessary
mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps
userfaultfd: clear flag if remap event not enabled
mm/swap: use nr_node_ids for avail_lists in swap_info_struct
mm/page-writeback.c: don't break integrity writeback on ->writepage() error
ocfs2: fix panic due to unrecovered local alloc
iomap: don't search past page end in iomap_is_partially_uptodate
scsi: megaraid: fix out-of-bound array accesses
scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()
ath10k: fix peer stats null pointer dereference
scsi: smartpqi: correct lun reset issues
scsi: mpt3sas: fix memory ordering on 64bit writes
IB/usnic: Fix potential deadlock
sysfs: Disable lockdep for driver bind/unbind files
ALSA: bebob: fix model-id of unit for Apogee Ensemble
Bluetooth: btusb: Add support for Intel bluetooth device 8087:0029
dm: Check for device sector overflow if CONFIG_LBDAF is not set
clocksource/drivers/integrator-ap: Add missing of_node_put()
quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls.
perf tools: Add missing open_memstream() prototype for systems lacking it
perf tools: Add missing sigqueue() prototype for systems lacking it
perf cs-etm: Correct packets swapping in cs_etm__flush()
dm snapshot: Fix excessive memory usage and workqueue stalls
tools lib subcmd: Don't add the kernel sources to the include path
perf stat: Avoid segfaults caused by negated options
dm kcopyd: Fix bug causing workqueue stalls
dm crypt: use u64 instead of sector_t to store iv_offset
x86/topology: Use total_cpus for max logical packages calculation
netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine
netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine
netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set
perf vendor events intel: Fix Load_Miss_Real_Latency on SKL/SKX
perf parse-events: Fix unchecked usage of strncpy()
perf svghelper: Fix unchecked usage of strncpy()
perf tests ARM: Disable breakpoint tests 32-bit
perf intel-pt: Fix error with config term "pt=0"
tty/serial: do not free trasnmit buffer page under port lock
btrfs: improve error handling of btrfs_add_link
btrfs: fix use-after-free due to race between replace start and cancel
btrfs: alloc_chunk: fix more DUP stripe size handling
btrfs: volumes: Make sure there is no overlap of dev extents at mount time
mmc: atmel-mci: do not assume idle after atmci_request_end
kconfig: fix memory leak when EOF is encountered in quotation
kconfig: fix file name and line number of warn_ignored_character()
bpf: relax verifier restriction on BPF_MOV | BPF_ALU
arm64: Fix minor issues with the dcache_by_line_op macro
clk: imx6q: reset exclusive gates on init
arm64: kasan: Increase stack size for KASAN_EXTRA
selftests: do not macro-expand failed assertion expressions
scsi: target/core: Make sure that target_wait_for_sess_cmds() waits long enough
scsi: target: use consistent left-aligned ASCII INQUIRY data
net: call sk_dst_reset when set SO_DONTROUTE
staging: erofs: fix use-after-free of on-stack `z_erofs_vle_unzip_io'
media: venus: core: Set dma maximum segment size
ASoC: use dma_ops of parent device for acp_audio_dma
media: firewire: Fix app_info parameter type in avc_ca{,_app}_info
powerpc/pseries/cpuidle: Fix preempt warning
powerpc/xmon: Fix invocation inside lock region
media: uvcvideo: Refactor teardown of uvc on USB disconnect
pstore/ram: Do not treat empty buffers as valid
clk: imx: make mux parent strings const
jffs2: Fix use of uninitialized delayed_work, lockdep breakage
efi/libstub: Disable some warnings for x86{,_64}
rxe: IB_WR_REG_MR does not capture MR's iova field
drm/amdgpu: Reorder uvd ring init before uvd resume
scsi: qedi: Check for session online before getting iSCSI TLV data.
ASoC: pcm3168a: Don't disable pcm3168a when CONFIG_PM defined
selinux: always allow mounting submounts
fpga: altera-cvp: fix probing for multiple FPGAs on the bus
usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device
samples: bpf: fix: error handling regarding kprobe_events
clk: meson: meson8b: fix incorrect divider mapping in cpu_scale_table
drm/atomic-helper: Complete fake_commit->flip_done potentially earlier
arm64: perf: set suppress_bind_attrs flag to true
crypto: ecc - regularize scalar for scalar multiplication
MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur
x86/mce: Fix -Wmissing-prototypes warnings
ALSA: oxfw: add support for APOGEE duet FireWire
bpf: Allow narrow loads with offset > 0
serial: set suppress_bind_attrs flag only if builtin
writeback: don't decrement wb->refcnt if !wb->bdi
of: overlay: add missing of_node_put() after add new node to changeset
selftests/bpf: enable (uncomment) all tests in test_libbpf.sh
usb: typec: tcpm: Do not disconnect link for self powered devices
e1000e: allow non-monotonic SYSTIM readings
platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey
ixgbe: allow IPsec Tx offload in VEPA mode
drm/amdkfd: fix interrupt spin lock
drm/amd/display: Guard against null stream_state in set_crc_source
gpio: pl061: Move irq_chip definition inside struct pl061
netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets
net: clear skb->tstamp in bridge forwarding path
ipv6: Take rcu_read_lock in __inet6_bind for mapped addresses
r8169: Add support for new Realtek Ethernet
qmi_wwan: add MTU default to qmap network interface
net, skbuff: do not prefer skb allocation fails early
net: dsa: mv88x6xxx: mv88e6390 errata
mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
mlxsw: spectrum: Disable lag port TX before removing it
ipv6: Consider sk_bound_dev_if when binding a socket to a v4 mapped address
UPSTREAM: dm: do not allow readahead to limit IO size
UPSTREAM: ppp: Move PFC decompression to PPP generic layer
UPSTREAM: l2tp: Add protocol field decompression
Conflicts:
include/linux/swap.h
Change-Id: I72fecd7d04341015f9397afe60d8469d548b4cb6
Signed-off-by: Ivaylo Georgiev <irgeorgiev@codeaurora.org>
|
||
|
|
b0cd52e644 |
mm/swap: use nr_node_ids for avail_lists in swap_info_struct
[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]
Since
|
||
|
|
ac4b2c42c8 |
mm: swap: swap ratio support
Add support to receive a static ratio from userspace to divide the swap pages between ZRAM and disk based swap devices. The existing infrastructure allows to keep same priority for multiple swap devices, which results in round robin distribution of pages. With this patch, the ratio can be defined. Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Charan Teja Reddy <charante@codeaurora.org> [swatsrid@codeaurora.org: Fix trivial merge conflicts] Signed-off-by: Swathi Sridhar <swatsrid@codeaurora.org> |
||
|
|
5d5e8f1954 |
mm, swap, get_swap_pages: use entry_size instead of cluster in parameter
As suggested by Matthew Wilcox, it is better to use "int entry_size"
instead of "bool cluster" as parameter to specify whether to operate for
huge or normal swap entries. Because this improve the flexibility to
support other swap entry size. And Dave Hansen thinks that this
improves code readability too.
So in this patch, the "bool cluster" parameter of get_swap_pages() is
replaced by "int entry_size".
And nr_swap_entries() trick is used to reduce the binary size when
!CONFIG_TRANSPARENT_HUGE_PAGE.
text data bss dec hex filename
base 24215 2028 340 26583 67d7 mm/swapfile.o
head 24123 2004 340 26467 6763 mm/swapfile.o
Link: http://lkml.kernel.org/r/20180720071845.17920-7-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
2cf855837b |
memcontrol: schedule throttling if we are congested
Memory allocations can induce swapping via kswapd or direct reclaim. If we are having IO done for us by kswapd and don't actually go into direct reclaim we may never get scheduled for throttling. So instead check to see if our cgroup is congested, and if so schedule the throttling. Before we return to user space the throttling stuff will only throttle if we actually required it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
|
|
24844fd339 |
Merge branch 'mm-rst' into docs-next
Mike Rapoport says: These patches convert files in Documentation/vm to ReST format, add an initial index and link it to the top level documentation. There are no contents changes in the documentation, except few spelling fixes. The relatively large diffstat stems from the indentation and paragraph wrapping changes. I've tried to keep the formatting as consistent as possible, but I could miss some places that needed markup and add some markup where it was not necessary. [jc: significant conflicts in vm/hmm.rst] |
||
|
|
ad56b738c5 |
docs/vm: rename documentation files to .rst
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> |
||
|
|
e9e9b7ecee |
mm: swap: unify cluster-based and vma-based swap readahead
This patch makes do_swap_page() not need to be aware of two different swap readahead algorithms. Just unify cluster-based and vma-based readahead function call. Link: http://lkml.kernel.org/r/1509520520-32367-3-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/20180220085249.151400-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
eaf649ebc3 |
mm: swap: clean up swap readahead
When I see recent change of swap readahead, I am very unhappy about current code structure which diverges two swap readahead algorithm in do_swap_page. This patch is to clean it up. Main motivation is that fault handler doesn't need to be aware of readahead algorithms but just should call swapin_readahead. As first step, this patch cleans up a little bit but not perfect (I just separate for review easier) so next patch will make the goal complete. [minchan@kernel.org: do not check readahead flag with THP anon] Link: http://lkml.kernel.org/r/874lm83zho.fsf@yhuang-dev.intel.com Link: http://lkml.kernel.org/r/20180227232611.169883-1-minchan@kernel.org Link: http://lkml.kernel.org/r/1509520520-32367-2-git-send-email-minchan@kernel.org Link: http://lkml.kernel.org/r/20180220085249.151400-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9c4e6b1a70 |
mm, mlock, vmscan: no more skipping pagevecs
When a thread mlocks an address space backed either by file pages which are currently not present in memory or swapped out anon pages (not in swapcache), a new page is allocated and added to the local pagevec (lru_add_pvec), I/O is triggered and the thread then sleeps on the page. On I/O completion, the thread can wake on a different CPU, the mlock syscall will then sets the PageMlocked() bit of the page but will not be able to put that page in unevictable LRU as the page is on the pagevec of a different CPU. Even on drain, that page will go to evictable LRU because the PageMlocked() bit is not checked on pagevec drain. The page will eventually go to right LRU on reclaim but the LRU stats will remain skewed for a long time. This patch puts all the pages, even unevictable, to the pagevecs and on the drain, the pages will be added on their LRUs correctly by checking their evictability. This resolves the mlocked pages on pagevec of other CPUs issue because when those pagevecs will be drained, the mlocked file pages will go to unevictable LRU. Also this makes the race with munlock easier to resolve because the pagevec drains happen in LRU lock. However there is still one place which makes a page evictable and does PageLRU check on that page without LRU lock and needs special attention. TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock(). #0: __pagevec_lru_add_fn #1: clear_page_mlock SetPageLRU() if (!TestClearPageMlocked()) return smp_mb() // <--required // inside does PageLRU if (!PageMlocked()) if (isolate_lru_page()) move to evictable LRU putback_lru_page() else move to unevictable LRU In '#1', TestClearPageMlocked() provides full memory barrier semantics and thus the PageLRU check (inside isolate_lru_page) can not be reordered before it. In '#0', without explicit memory barrier, the PageMlocked() check can be reordered before SetPageLRU(). If that happens, '#0' can put a page in unevictable LRU and '#1' might have just cleared the Mlocked bit of that page but fails to isolate as PageLRU fails as '#0' still hasn't set PageLRU bit of that page. That page will be stranded on the unevictable LRU. There is one (good) side effect though. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shaohua Li <shli@fb.com> Cc: Jan Kara <jack@suse.cz> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a4ef876841 |
mm: remove unused pgdat_reclaimable_pages()
Remove unused function pgdat_reclaimable_pages() and node_page_state_snapshot() which becomes unused as well. Link: http://lkml.kernel.org/r/20171122094416.26019-1-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9852a72123 |
mm: drop hotplug lock from lru_add_drain_all()
Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this. While the usage in that particular case might be
wrong we should avoid the locking as lru_add_drain_all() is used in many
places. It seems that this is not all that hard to achieve actually.
We have done the same thing for drain_all_pages which is analogous by
commit
|
||
|
|
c6f92f9fbe |
mm: remove cold parameter for release_pages
All callers of release_pages claim the pages being released are cache hot. As no one cares about the hotness of pages being released to the allocator, just ditch the parameter. No performance impact is expected as the overhead is marginal. The parameter is removed simply because it is a bit stupid to have a useless parameter copied everywhere. Link: http://lkml.kernel.org/r/20171018075952.10627-7-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c7df8ad291 |
mm, truncate: do not check mapping for every page being truncated
During truncation, the mapping has already been checked for shmem and
dax so it's known that workingset_update_node is required.
This patch avoids the checks on mapping for each page being truncated.
In all other cases, a lookup helper is used to determine if
workingset_update_node() needs to be called. The one danger is that the
API is slightly harder to use as calling workingset_update_node directly
without checking for dax or shmem mappings could lead to surprises.
However, the API rarely needs to be used and hopefully the comment is
enough to give people the hint.
sparsetruncate (tiny)
4.14.0-rc4 4.14.0-rc4
oneirq-v1r1 pickhelper-v1r1
Min Time 141.00 ( 0.00%) 140.00 ( 0.71%)
1st-qrtle Time 142.00 ( 0.00%) 141.00 ( 0.70%)
2nd-qrtle Time 142.00 ( 0.00%) 142.00 ( 0.00%)
3rd-qrtle Time 143.00 ( 0.00%) 143.00 ( 0.00%)
Max-90% Time 144.00 ( 0.00%) 144.00 ( 0.00%)
Max-95% Time 147.00 ( 0.00%) 145.00 ( 1.36%)
Max-99% Time 195.00 ( 0.00%) 191.00 ( 2.05%)
Max Time 230.00 ( 0.00%) 205.00 ( 10.87%)
Amean Time 144.37 ( 0.00%) 143.82 ( 0.38%)
Stddev Time 10.44 ( 0.00%) 9.00 ( 13.74%)
Coeff Time 7.23 ( 0.00%) 6.26 ( 13.41%)
Best99%Amean Time 143.72 ( 0.00%) 143.34 ( 0.26%)
Best95%Amean Time 142.37 ( 0.00%) 142.00 ( 0.26%)
Best90%Amean Time 142.19 ( 0.00%) 141.85 ( 0.24%)
Best75%Amean Time 141.92 ( 0.00%) 141.58 ( 0.24%)
Best50%Amean Time 141.69 ( 0.00%) 141.31 ( 0.27%)
Best25%Amean Time 141.38 ( 0.00%) 140.97 ( 0.29%)
As you'd expect, the gain is marginal but it can be detected. The
differences in bonnie are all within the noise which is not surprising
given the impact on the microbenchmark.
radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field. The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the
private field is removed.
Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
aa8d22a11d |
mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference
When SWP_SYNCHRONOUS_IO swapped-in pages are shared by several processes, it can cause unnecessary memory wastage by skipping swap cache. Because, with swapin fault by read, they could share a page if the page were in swap cache. Thus, it avoids allocating same content new pages. This patch makes the swapcache skipping work only if the swap pte is non-sharable. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1507620825-5537-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0bcac06f27 |
mm, swap: skip swapcache for swapin of synchronous device
With fast swap storage, the platforms want to use swap more aggressively and swap-in is crucial to application latency. The rw_page() based synchronous devices like zram, pmem and btt are such fast storage. When I profile swapin performance with zram lz4 decompress test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm. This patch aims to reduce swap-in latency by skipping swapcache if the swap device is synchronous device like rw_page based device. It enhances 45% my swapin test(5G sequential swapin, no readahead, from 2.41sec to 1.64sec). Link: http://lkml.kernel.org/r/1505886205-9671-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
539a6fea7f |
mm, swap: introduce SWP_SYNCHRONOUS_IO
If rw-page based fast storage is used for swap devices, we need to detect it to enhance swap IO operations. This patch is preparation for optimizing of swap-in operation with next patch. Link: http://lkml.kernel.org/r/1505886205-9671-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
2628bd6fc0 |
mm, swap: fix race between swap count continuation operations
One page may store a set of entries of the sis->swap_map
(swap_info_struct->swap_map) in multiple swap clusters.
If some of the entries has sis->swap_map[offset] > SWAP_MAP_MAX,
multiple pages will be used to store the set of entries of the
sis->swap_map. And the pages are linked with page->lru. This is called
swap count continuation. To access the pages which store the set of
entries of the sis->swap_map simultaneously, previously, sis->lock is
used. But to improve the scalability of __swap_duplicate(), swap
cluster lock may be used in swap_count_continued() now. This may race
with add_swap_count_continuation() which operates on a nearby swap
cluster, in which the sis->swap_map entries are stored in the same page.
The race can cause wrong swap count in practice, thus cause unfreeable
swap entries or software lockup, etc.
To fix the race, a new spin lock called cont_lock is added to struct
swap_info_struct to protect the swap count continuation page list. This
is a lock at the swap device level, so the scalability isn't very well.
But it is still much better than the original sis->lock, because it is
only acquired/released when swap count continuation is used. Which is
considered rare in practice. If it turns out that the scalability
becomes an issue for some workloads, we can split the lock into some
more fine grained locks.
Link: http://lkml.kernel.org/r/20171017081320.28133-1-ying.huang@intel.com
Fixes:
|
||
|
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
5042db43cc |
mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory
HMM (heterogeneous memory management) need struct page to support migration from system main memory to device memory. Reasons for HMM and migration to device memory is explained with HMM core patch. This patch deals with device memory that is un-addressable memory (ie CPU can not access it). Hence we do not want those struct page to be manage like regular memory. That is why we extend ZONE_DEVICE to support different types of memory. A persistent memory type is define for existing user of ZONE_DEVICE and a new device un-addressable type is added for the un-addressable memory type. There is a clear separation between what is expected from each memory type and existing user of ZONE_DEVICE are un-affected by new requirement and new use of the un-addressable type. All specific code path are protect with test against the memory type. Because memory is un-addressable we use a new special swap type for when a page is migrated to device memory (this reduces the number of maximum swap file). The main two additions beside memory type to ZONE_DEVICE is two callbacks. First one, page_free() is call whenever page refcount reach 1 (which means the page is free as ZONE_DEVICE page never reach a refcount of 0). This allow device driver to manage its memory and associated struct page. The second callback page_fault() happens when there is a CPU access to an address that is back by a device page (which are un-addressable by the CPU). This callback is responsible to migrate the page back to system main memory. Device driver can not block migration back to system memory, HMM make sure that such page can not be pin into device memory. If device is in some error condition and can not migrate memory back then a CPU page fault to device memory should end with SIGBUS. [arnd@arndb.de: fix warning] Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Nellans <dnellans@nvidia.com> Cc: Evgeny Baskakov <ebaskakov@nvidia.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mark Hairgrove <mhairgrove@nvidia.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sherry Cheung <SCheung@nvidia.com> Cc: Subhash Gutti <sgutti@nvidia.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Bob Liu <liubo95@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a2468cc9bf |
swap: choose swap device according to numa node
If the system has more than one swap device and swap device has the node information, we can make use of this information to decide which swap device to use in get_swap_pages() to get better performance. The current code uses a priority based list, swap_avail_list, to decide which swap device to use and if multiple swap devices share the same priority, they are used round robin. This patch changes the previous single global swap_avail_list into a per-numa-node list, i.e. for each numa node, it sees its own priority based list of available swap devices. Swap device's priority can be promoted on its matching node's swap_avail_list. The current swap device's priority is set as: user can set a >=0 value, or the system will pick one starting from -1 then downwards. The priority value in the swap_avail_list is the negated value of the swap device's due to plist being sorted from low to high. The new policy doesn't change the semantics for priority >=0 cases, the previous starting from -1 then downwards now becomes starting from -2 then downwards and -1 is reserved as the promoted value. Take 4-node EX machine as an example, suppose 4 swap devices are available, each sit on a different node: swapA on node 0 swapB on node 1 swapC on node 2 swapD on node 3 After they are all swapped on in the sequence of ABCD. Current behaviour: their priorities will be: swapA: -1 swapB: -2 swapC: -3 swapD: -4 And their position in the global swap_avail_list will be: swapA -> swapB -> swapC -> swapD prio:1 prio:2 prio:3 prio:4 New behaviour: their priorities will be(note that -1 is skipped): swapA: -2 swapB: -3 swapC: -4 swapD: -5 And their positions in the 4 swap_avail_lists[nid] will be: swap_avail_lists[0]: /* node 0's available swap device list */ swapA -> swapB -> swapC -> swapD prio:1 prio:3 prio:4 prio:5 swap_avali_lists[1]: /* node 1's available swap device list */ swapB -> swapA -> swapC -> swapD prio:1 prio:2 prio:4 prio:5 swap_avail_lists[2]: /* node 2's available swap device list */ swapC -> swapA -> swapB -> swapD prio:1 prio:2 prio:3 prio:5 swap_avail_lists[3]: /* node 3's available swap device list */ swapD -> swapA -> swapB -> swapC prio:1 prio:2 prio:3 prio:4 To see the effect of the patch, a test that starts N process, each mmap a region of anonymous memory and then continually write to it at random position to trigger both swap in and out is used. On a 2 node Skylake EP machine with 64GiB memory, two 170GB SSD drives are used as swap devices with each attached to a different node, the result is: runtime=30m/processes=32/total test size=128G/each process mmap region=4G kernel throughput vanilla 13306 auto-binding 15169 +14% runtime=30m/processes=64/total test size=128G/each process mmap region=2G kernel throughput vanilla 11885 auto-binding 14879 +25% [aaron.lu@intel.com: v2] Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com [akpm@linux-foundation.org: use kmalloc_array()] Link: http://lkml.kernel.org/r/20170814053130.GD2369@aaronlu.sh.intel.com Link: http://lkml.kernel.org/r/20170816024439.GA10925@aaronlu.sh.intel.com Signed-off-by: Aaron Lu <aaron.lu@intel.com> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
81a0298bdf |
mm, swap: don't use VMA based swap readahead if HDD is used as swap
VMA based swap readahead will readahead the virtual pages that is continuous in the virtual address space. While the original swap readahead will readahead the swap slots that is continuous in the swap device. Although VMA based swap readahead is more correct for the swap slots to be readahead, it will trigger more small random readings, which may cause the performance of HDD (hard disk) to degrade heavily, and may finally exceed the benefit. To avoid the issue, in this patch, if the HDD is used as swap, the VMA based swap readahead will be disabled, and the original swap readahead will be used instead. Link: http://lkml.kernel.org/r/20170807054038.1843-6-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ec560175c0 |
mm, swap: VMA based swap readahead
The swap readahead is an important mechanism to reduce the swap in latency. Although pure sequential memory access pattern isn't very popular for anonymous memory, the space locality is still considered valid. In the original swap readahead implementation, the consecutive blocks in swap device are readahead based on the global space locality estimation. But the consecutive blocks in swap device just reflect the order of page reclaiming, don't necessarily reflect the access pattern in virtual memory. And the different tasks in the system may have different access patterns, which makes the global space locality estimation incorrect. In this patch, when page fault occurs, the virtual pages near the fault address will be readahead instead of the swap slots near the fault swap slot in swap device. This avoid to readahead the unrelated swap slots. At the same time, the swap readahead is changed to work on per-VMA from globally. So that the different access patterns of the different VMAs could be distinguished, and the different readahead policy could be applied accordingly. The original core readahead detection and scaling algorithm is reused, because it is an effect algorithm to detect the space locality. The test and result is as follow, Common test condition ===================== Test Machine: Xeon E5 v3 (2 sockets, 72 threads, 32G RAM) Swap device: NVMe disk Micro-benchmark with combined access pattern ============================================ vm-scalability, sequential swap test case, 4 processes to eat 50G virtual memory space, repeat the sequential memory writing until 300 seconds. The first round writing will trigger swap out, the following rounds will trigger sequential swap in and out. At the same time, run vm-scalability random swap test case in background, 8 processes to eat 30G virtual memory space, repeat the random memory write until 300 seconds. This will trigger random swap-in in the background. This is a combined workload with sequential and random memory accessing at the same time. The result (for sequential workload) is as follow, Base Optimized ---- --------- throughput 345413 KB/s 414029 KB/s (+19.9%) latency.average 97.14 us 61.06 us (-37.1%) latency.50th 2 us 1 us latency.60th 2 us 1 us latency.70th 98 us 2 us latency.80th 160 us 2 us latency.90th 260 us 217 us latency.95th 346 us 369 us latency.99th 1.34 ms 1.09 ms ra_hit% 52.69% 99.98% The original swap readahead algorithm is confused by the background random access workload, so readahead hit rate is lower. The VMA-base readahead algorithm works much better. Linpack ======= The test memory size is bigger than RAM to trigger swapping. Base Optimized ---- --------- elapsed_time 393.49 s 329.88 s (-16.2%) ra_hit% 86.21% 98.82% The score of base and optimized kernel hasn't visible changes. But the elapsed time reduced and readahead hit rate improved, so the optimized kernel runs better for startup and tear down stages. And the absolute value of readahead hit rate is high, shows that the space locality is still valid in some practical workloads. Link: http://lkml.kernel.org/r/20170807054038.1843-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
c41f012ade |
mm: rename global_page_state to global_zone_page_state
global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests. We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:
$ git grep "global_page_state(NR_" | sed 's@.*(\(NR_[A-Z_]*\)).*@\1@' | sort | uniq -c
2 NR_BOUNCE
2 NR_FREE_CMA_PAGES
11 NR_FREE_PAGES
1 NR_KERNEL_STACK_KB
1 NR_MLOCK
2 NR_PAGETABLE
This patch shouldn't introduce any functional change.
[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp
Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
59807685a7 |
mm, THP, swap: support splitting THP for THP swap out
After adding swapping out support for THP (Transparent Huge Page), it is possible that a THP in swap cache (partly swapped out) need to be split. To split such a THP, the swap cluster backing the THP need to be split too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared for the swap cluster. The patch implemented this. And because the THP swap writing needs the THP keeps as huge page during writing. The PageWriteback flag is checked before splitting. Link: http://lkml.kernel.org/r/20170724051840.2309-8-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Vishal L Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ba3c4ce6de |
mm, THP, swap: make reuse_swap_page() works for THP swapped out
After supporting to delay THP (Transparent Huge Page) splitting after swapped out, it is possible that some page table mappings of the THP are turned into swap entries. So reuse_swap_page() need to check the swap count in addition to the map count as before. This patch done that. In the huge PMD write protect fault handler, in addition to the page map count, the swap count need to be checked too, so the page lock need to be acquired too when calling reuse_swap_page() in addition to the page table lock. [ying.huang@intel.com: silence a compiler warning] Link: http://lkml.kernel.org/r/87bmnzizjy.fsf@yhuang-dev.intel.com Link: http://lkml.kernel.org/r/20170724051840.2309-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Vishal L Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e07098294a |
mm, THP, swap: support to reclaim swap space for THP swapped out
The normal swap slot reclaiming can be done when the swap count reaches SWAP_HAS_CACHE. But for the swap slot which is backing a THP, all swap slots backing one THP must be reclaimed together, because the swap slot may be used again when the THP is swapped out again later. So the swap slots backing one THP can be reclaimed together when the swap count for all swap slots for the THP reached SWAP_HAS_CACHE. In the patch, the functions to check whether the swap count for all swap slots backing one THP reached SWAP_HAS_CACHE are implemented and used when checking whether a swap slot can be reclaimed. To make it easier to determine whether a swap slot is backing a THP, a new swap cluster flag named CLUSTER_FLAG_HUGE is added to mark a swap cluster which is backing a THP (Transparent Huge Page). Because THP swap in as a whole isn't supported now. After deleting the THP from the swap cache (for example, swapping out finished), the CLUSTER_FLAG_HUGE flag will be cleared. So that, the normal pages inside THP can be swapped in individually. [ying.huang@intel.com: fix swap_page_trans_huge_swapped on HDD] Link: http://lkml.kernel.org/r/874ltsm0bi.fsf@yhuang-dev.intel.com Link: http://lkml.kernel.org/r/20170724051840.2309-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c] Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a47fed5b5b |
mm: swap: provide lru_add_drain_all_cpuslocked()
The rework of the cpu hotplug locking unearthed potential deadlocks with the memory hotplug locking code. The solution for these is to rework the memory hotplug locking code as well and take the cpu hotplug lock before the memory hotplug lock in mem_hotplug_begin(), but this will cause a recursive locking of the cpu hotplug lock when the memory hotplug code calls lru_add_drain_all(). Split out the inner workings of lru_add_drain_all() into lru_add_drain_all_cpuslocked() so this function can be invoked from the memory hotplug code with the cpu hotplug lock held. Link: http://lkml.kernel.org/r/20170704093421.419329357@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
23955622ff |
swap: add block io poll in swapin path
For fast flash disk, async IO could introduce overhead because of context switch. block-mq now supports IO poll, which improves performance and latency a lot. swapin is a good place to use this technique, because the task is waiting for the swapin page to continue execution. In my virtual machine, directly read 4k data from a NVMe with iopoll is about 60% better than that without poll. With iopoll support in swapin patch, my microbenchmark (a task does random memory write) is about 10%~25% faster. CPU utilization increases a lot though, 2x and even 3x CPU utilization. This will depend on disk speed. While iopoll in swapin isn't intended for all usage cases, it's a win for latency sensistive workloads with high speed swap disk. block layer has knob to control poll in runtime. If poll isn't enabled in block layer, there should be no noticeable change in swapin. I got a chance to run the same test in a NVMe with DRAM as the media. In simple fio IO test, blkpoll boosts 50% performance in single thread test and ~20% in 8 threads test. So this is the base line. In above swap test, blkpoll boosts ~27% performance in single thread test. blkpoll uses 2x CPU time though. If we enable hybid polling, the performance gain has very slight drop but CPU time is only 50% worse than that without blkpoll. Also we can adjust parameter of hybid poll, with it, the CPU time penality is reduced further. In 8 threads test, blkpoll doesn't help though. The performance is similar to that without blkpoll, but cpu utilization is similar too. There is lock contention in swap path. The cpu time spending on blkpoll isn't high. So overall, blkpoll swapin isn't worse than that without it. The swapin readahead might read several pages in in the same time and form a big IO request. Since the IO will take longer time, it doesn't make sense to do poll, so the patch only does iopoll for single page swapin. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965c Signed-off-by: Shaohua Li <shli@fb.com> Cc: Tim Chen <tim.c.chen@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0f0746589e |
mm, THP, swap: move anonymous THP split logic to vmscan
The add_to_swap aims to allocate swap_space(ie, swap slot and swapcache) so if it fails due to lack of space in case of THP or something(hdd swap but tries THP swapout) *caller* rather than add_to_swap itself should split the THP page and retry it with base page which is more natural. Link: http://lkml.kernel.org/r/20170515112522.32457-4-ying.huang@intel.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
75f6d6d29a |
mm, THP, swap: unify swap slot free functions to put_swap_page
Now, get_swap_page takes struct page and allocates swap space according to page size(ie, normal or THP) so it would be more cleaner to introduce put_swap_page which is a counter function of get_swap_page. Then, it calls right swap slot free function depending on page's size. [ying.huang@intel.com: minor cleanup and fix] Link: http://lkml.kernel.org/r/20170515112522.32457-3-ying.huang@intel.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
38d8b4e6bd |
mm, THP, swap: delay splitting THP during swap out
Patch series "THP swap: Delay splitting THP during swapping out", v11. This patchset is to optimize the performance of Transparent Huge Page (THP) swap. Recently, the performance of the storage devices improved so fast that we cannot saturate the disk bandwidth with single logical CPU when do page swap out even on a high-end server machine. Because the performance of the storage device improved faster than that of single logical CPU. And it seems that the trend will not change in the near future. On the other hand, the THP becomes more and more popular because of increased memory size. So it becomes necessary to optimize THP swap performance. The advantages of the THP swap support include: - Batch the swap operations for the THP to reduce lock acquiring/releasing, including allocating/freeing the swap space, adding/deleting to/from the swap cache, and writing/reading the swap space, etc. This will help improve the performance of the THP swap. - The THP swap space read/write will be 2M sequential IO. It is particularly helpful for the swap read, which are usually 4k random IO. This will improve the performance of the THP swap too. - It will help the memory fragmentation, especially when the THP is heavily used by the applications. The 2M continuous pages will be free up after THP swapping out. - It will improve the THP utilization on the system with the swap turned on. Because the speed for khugepaged to collapse the normal pages into the THP is quite slow. After the THP is split during the swapping out, it will take quite long time for the normal pages to collapse back into the THP after being swapped in. The high THP utilization helps the efficiency of the page based memory management too. There are some concerns regarding THP swap in, mainly because possible enlarged read/write IO size (for swap in/out) may put more overhead on the storage device. To deal with that, the THP swap in should be turned on only when necessary. For example, it can be selected via "always/never/madvise" logic, to be turned on globally, turned off globally, or turned on only for VMA with MADV_HUGEPAGE, etc. This patchset is the first step for the THP swap support. The plan is to delay splitting THP step by step, finally avoid splitting THP during the THP swapping out and swap out/in the THP as a whole. As the first step, in this patchset, the splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP and adding the THP into the swap cache. This will reduce lock acquiring/releasing for the locks used for the swap cache management. With the patchset, the swap out throughput improves 15.5% (from about 3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case with 8 processes. The test is done on a Xeon E5 v3 system. The swap device used is a RAM simulated PMEM (persistent memory) device. To test the sequential swapping out, the test case creates 8 processes, which sequentially allocate and write to the anonymous pages until the RAM and part of the swap device is used up. This patch (of 5): In this patch, splitting huge page is delayed from almost the first step of swapping out to after allocating the swap space for the THP (Transparent Huge Page) and adding the THP into the swap cache. This will batch the corresponding operation, thus improve THP swap out throughput. This is the first step for the THP swap optimization. The plan is to delay splitting the THP step by step and avoid splitting the THP finally. In this patch, one swap cluster is used to hold the contents of each THP swapped out. So, the size of the swap cluster is changed to that of the THP (Transparent Huge Page) on x86_64 architecture (512). For other architectures which want such THP swap optimization, ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for the architecture. In effect, this will enlarge swap cluster size by 2 times on x86_64. Which may make it harder to find a free cluster when the swap space becomes fragmented. So that, this may reduce the continuous swap space allocation and sequential write in theory. The performance test in 0day shows no regressions caused by this. In the future of THP swap optimization, some information of the swapped out THP (such as compound map count) will be recorded in the swap_cluster_info data structure. The mem cgroup swap accounting functions are enhanced to support charge or uncharge a swap cluster backing a THP as a whole. The swap cluster allocate/free functions are added to allocate/free a swap cluster for a THP. A fair simple algorithm is used for swap cluster allocation, that is, only the first swap device in priority list will be tried to allocate the swap cluster. The function will fail if the trying is not successful, and the caller will fallback to allocate a single swap slot instead. This works good enough for normal cases. If the difference of the number of the free swap clusters among multiple swap devices is significant, it is possible that some THPs are split earlier than necessary. For example, this could be caused by big size difference among multiple swap devices. The swap cache functions is enhanced to support add/delete THP to/from the swap cache as a set of (HPAGE_PMD_NR) sub-pages. This may be enhanced in the future with multi-order radix tree. But because we will split the THP soon during swapping out, that optimization doesn't make much sense for this first step. The THP splitting functions are enhanced to support to split THP in swap cache during swapping out. The page lock will be held during allocating the swap cluster, adding the THP into the swap cache and splitting the THP. So in the code path other than swapping out, if the THP need to be split, the PageSwapCache(THP) will be always false. The swap cluster is only available for SSD, so the THP swap optimization in this patchset has no effect for HDD. [ying.huang@intel.com: fix two issues in THP optimize patch] Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com [hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size] Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option] Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h] Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
df6b749980 |
mm, swap: remove unused function prototype
This is a code cleanup patch, no functionality changes. There are 2 unused function prototype in swap.h, they are removed. Link: http://lkml.kernel.org/r/20170405071017.23677-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f7ad2a6cb9 |
mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
madv()'s MADV_FREE indicate pages are 'lazyfree'. They are still anonymous pages, but they can be freed without pageout. To distinguish these from normal anonymous pages, we clear their SwapBacked flag. MADV_FREE pages could be freed without pageout, so they pretty much like used once file pages. For such pages, we'd like to reclaim them once there is memory pressure. Also it might be unfair reclaiming MADV_FREE pages always before used once file pages and we definitively want to reclaim the pages before other anonymous and file pages. To speed up MADV_FREE pages reclaim, we put the pages into LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny nowadays and should be full of used once file pages. Reclaiming MADV_FREE pages will not have much interfere of anonymous and active file pages. And the inactive file pages and MADV_FREE pages will be reclaimed according to their age, so we don't reclaim too many MADV_FREE pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also means we can reclaim the pages without swap support. This idea is suggested by Johannes. This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to avoid bisect failure, next patch will do it. The patch is based on Minchan's original patch. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/2f87063c1e9354677b7618c647abde77b07561e5.1487965799.git.shli@fb.com Signed-off-by: Shaohua Li <shli@fb.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
67afa38e01 |
mm/swap: add cache for swap slots allocation
We add per cpu caches for swap slots that can be allocated and freed quickly without the need to touch the swap info lock. Two separate caches are maintained for swap slots allocated and swap slots returned. This is to allow the swap slots to be returned to the global pool in a batch so they will have a chance to be coaelesced with other slots in a cluster. We do not reuse the slots that are returned right away, as it may increase fragmentation of the slots. The swap allocation cache is protected by a mutex as we may sleep when searching for empty slots in cache. The swap free cache is protected by a spin lock as we cannot sleep in the free path. We refill the swap slots cache when we run out of slots, and we disable the swap slots cache and drain the slots if the global number of slots fall below a low watermark threshold. We re-enable the cache agian when the slots available are above a high watermark. [ying.huang@intel.com: use raw_cpu_ptr over this_cpu_ptr for swap slots access] [tim.c.chen@linux.intel.com: add comments on locks in swap_slots.h] Link: http://lkml.kernel.org/r/20170118180327.GA24225@linux.intel.com Link: http://lkml.kernel.org/r/35de301a4eaa8daa2977de6e987f2c154385eb66.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> escreveu: Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7c00bafee8 |
mm/swap: free swap slots in batch
Add new functions that free unused swap slots in batches without the need to reacquire swap info lock. This improves scalability and reduce lock contention. Link: http://lkml.kernel.org/r/c25e0fcdfd237ec4ca7db91631d3b9f6ed23824e.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> escreveu: Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |